Data Analysis Using Pandas in Python
What is Pandas?
Pandas is a powerful open-source Python library used for data analysis and manipulation. It provides two primary data structures:
- Series – One-dimensional labeled array
- DataFrame – Two-dimensional labeled data structure
Pandas is widely used in:
- Data cleaning
- Exploratory Data Analysis (EDA)
- Data transformation
- Time-series analysis
Install Pandas
pip install pandas
Importing Pandas
import pandas as pd
Creating DataFrames and Series
Series
s = pd.Series([10, 20, 30, 40])
print(s)
DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'London', 'Tokyo']
}
df = pd.DataFrame(data)
print(df)
Reading Data from Files
CSV File
df = pd.read_csv('data.csv')
Excel File
df = pd.read_excel('data.xlsx')
Viewing and Inspecting Data
df.head() # First 5 rows
df.tail(3) # Last 3 rows
df.info() # Column info
df.describe() # Summary stats
df.columns # Column names
df.shape # (rows, columns)
Selecting Data
By Column
df['Name'] # Single column
df[['Name', 'City']] # Multiple columns
By Row
df.iloc[0] # First row
df.loc[1] # Row with index 1
Filtering and Querying Data
# Filter rows where Age > 30
df[df['Age'] > 30]
# Filter by multiple conditions
df[(df['Age'] > 25) & (df['City'] == 'London')]
Modifying Data
Adding New Columns
df['Salary'] = [50000, 60000, 70000]
Updating Values
df.at[0, 'Age'] = 26
Renaming Columns
df.rename(columns={'Name': 'Full Name'}, inplace=True)
Handling Missing Data
df.isnull() # Check missing values
df.dropna() # Drop rows with nulls
df.fillna(0) # Fill nulls with 0
Grouping and Aggregating Data
df.groupby('City')['Age'].mean()
Sorting Data
df.sort_values(by='Age', ascending=False)
Merging and Joining DataFrames
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['A', 'B']})
df2 = pd.DataFrame({'ID': [1, 2], 'Score': [85, 90]})
result = pd.merge(df1, df2, on='ID')
print(result)
Exporting Data
df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx', index=False)