This chapter introduces data handling using Pandas, covering the basics of Python libraries, Series and DataFrames, their creation, operations, and importing/exporting data with CSV files.
Python libraries are collections of built-in modules facilitating a variety of tasks without needing extensive programming effort. Key libraries for data science and analysis include:
While both are essential for data manipulation, Pandas offers significant advantages:
Pandas can be installed using the command:
pip install pandas
Ensure that Python is already installed on your system as a prerequisite.
Two main types of data structures are explored:
import pandas as pd
series1 = pd.Series([10, 20, 30])
import numpy as np
array1 = np.array([1, 2, 3])
series2 = pd.Series(array1)
dict1 = {'India': 'NewDelhi', 'UK': 'London'}
series3 = pd.Series(dict1)
Elements can be accessed via indexing and slicing:
[start:end] syntax for numeric index, and label slices include the end.Key attributes include size, index, values, and methods like head(), tail(), and count() to manage and analyze data.
data = {'Column1': [1, 2], 'Column2': [3, 4]}
df = pd.DataFrame(data)
ResultSheet = {'Arnab': pd.Series([...]), ...}
df = pd.DataFrame(ResultSheet)
loc[] to assign or change values.drop() method to remove rows or columns.rename() method.Pandas facilitates easy loading of data from CSV using read_csv() and exporting DataFrames to CSV using to_csv() methods.
Example:
import pandas as pd
marks = pd.read_csv('path/to/file.csv')
This chapter extensively illustrates how to effectively manipulate, analyze, and visualize data using Pandas, equipping strudents with essential data handling skills.
pip install pandas to install the library.read_csv() and export using to_csv() methods.