Data Visualization in Python using Matplotlib and Seaborn
What is Data Visualization?
Data Visualization is the graphical representation of data and information using charts, graphs, and plots. In Python, the most popular libraries for data visualization are:
- Matplotlib: Low-level, highly customizable plotting library.
- Seaborn: Built on top of Matplotlib, offers a higher-level interface and attractive statistical plots.
Installing Libraries
pip install matplotlib seaborn
1. Introduction to Matplotlib
Basic Line Plot
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 35]
plt.plot(x, y)
plt.title("Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(True)
plt.show()
Bar Chart
categories = ['A', 'B', 'C']
values = [10, 30, 20]
plt.bar(categories, values)
plt.title("Bar Chart")
plt.xlabel("Category")
plt.ylabel("Value")
plt.show()
Pie Chart
labels = ['Apple', 'Banana', 'Cherry']
sizes = [30, 50, 20]
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=140)
plt.title("Fruit Distribution")
plt.axis('equal')
plt.show()
Histogram
import numpy as np
data = np.random.randn(1000)
plt.hist(data, bins=30, color='skyblue', edgecolor='black')
plt.title("Histogram of Random Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
2. Introduction to Seaborn
import seaborn as sns
import pandas as pd
# Sample DataFrame
data = pd.DataFrame({
'Age': [22, 25, 30, 35, 40, 45],
'Salary': [25000, 32000, 48000, 58000, 60000, 75000],
'Department': ['HR', 'HR', 'IT', 'IT', 'Finance', 'Finance']
})
Seaborn Line Plot
sns.lineplot(x='Age', y='Salary', data=data)
plt.title("Salary vs Age")
plt.show()
Seaborn Bar Plot
sns.barplot(x='Department', y='Salary', data=data)
plt.title("Average Salary by Department")
plt.show()
Seaborn Histogram / Distribution Plot
sns.histplot(data['Salary'], bins=10, kde=True)
plt.title("Salary Distribution")
plt.show()
Box Plot
sns.boxplot(x='Department', y='Salary', data=data)
plt.title("Salary Distribution by Department")
plt.show()
Heatmap (Correlation Matrix)
# Correlation heatmap
correlation = data[['Age', 'Salary']].corr()
sns.heatmap(correlation, annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()
Comparing Matplotlib vs Seaborn
Feature | Matplotlib | Seaborn |
---|---|---|
Level | Low-level | High-level |
Customization | Full control | Limited but beautiful by default |
Ease of Use | Steeper learning curve | Easier for beginners |
Use Case | Detailed plots, custom use | Quick statistical data visualization |