Data Analysis Using Pandas in Python


What is Pandas?

Pandas is a powerful open-source Python library used for data analysis and manipulation. It provides two primary data structures:

  • Series – One-dimensional labeled array
  • DataFrame – Two-dimensional labeled data structure

Pandas is widely used in:

  • Data cleaning
  • Exploratory Data Analysis (EDA)
  • Data transformation
  • Time-series analysis

Install Pandas

pip install pandas

Importing Pandas

import pandas as pd


Creating DataFrames and Series

Series

s = pd.Series([10, 20, 30, 40])
print(s)

DataFrame

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'London', 'Tokyo']
}

df = pd.DataFrame(data)
print(df)


Reading Data from Files

CSV File

df = pd.read_csv('data.csv')

Excel File

df = pd.read_excel('data.xlsx')


Viewing and Inspecting Data

df.head()          # First 5 rows
df.tail(3)         # Last 3 rows
df.info()          # Column info
df.describe()      # Summary stats
df.columns         # Column names
df.shape           # (rows, columns)


Selecting Data

By Column

df['Name']             # Single column
df[['Name', 'City']]   # Multiple columns

By Row

df.iloc[0]      # First row
df.loc[1]       # Row with index 1


Filtering and Querying Data

# Filter rows where Age > 30
df[df['Age'] > 30]

# Filter by multiple conditions
df[(df['Age'] > 25) & (df['City'] == 'London')]


Modifying Data

Adding New Columns

df['Salary'] = [50000, 60000, 70000]

Updating Values

df.at[0, 'Age'] = 26

Renaming Columns

df.rename(columns={'Name': 'Full Name'}, inplace=True)


Handling Missing Data

df.isnull()                # Check missing values
df.dropna()                # Drop rows with nulls
df.fillna(0)               # Fill nulls with 0


Grouping and Aggregating Data

df.groupby('City')['Age'].mean()


Sorting Data

df.sort_values(by='Age', ascending=False)


Merging and Joining DataFrames

df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['A', 'B']})
df2 = pd.DataFrame({'ID': [1, 2], 'Score': [85, 90]})

result = pd.merge(df1, df2, on='ID')
print(result)


Exporting Data

df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx', index=False)