Machine Learning Project
Project 1: Iris Flower Classification (ML Classification Project)
Objective: Predict the species of an iris flower (Setosa, Versicolor, or Virginica) based on the features: sepal length, sepal width, petal length, petal width.
Step 1: Import Required Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
Step 2: Load and Explore the Dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = iris.target
df['species'] = df['species'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})
print(df.head())
Step 3: Data Visualization
sns.pairplot(df, hue='species')
plt.show()
Step 4: Train-Test Split
X = df.iloc[:, :-1]
y = df['species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Step 5: Feature Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Step 6: Model Training
model = LogisticRegression()
model.fit(X_train, y_train)
Step 7: Evaluation
y_pred = model.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
Output: You will get a confusion matrix and precision/recall/F1-score values showing how well the model classifies the flower species.
Project 2: House Price Prediction (ML Regression Project)
Objective: Predict house prices based on features like area, bedrooms, location, etc.
Step 1: Import Required Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
Step 2: Load the Dataset
Use a CSV file like housing.csv, or load from a known dataset.
df = pd.read_csv('housing.csv')
print(df.head())
Step 3: Preprocess the Data
df.dropna(inplace=True) # Remove missing values
df = pd.get_dummies(df, drop_first=True) # Convert categorical to numerical
Step 4: Train-Test Split
X = df.drop('price', axis=1)
y = df['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 5: Model Training
model = LinearRegression()
model.fit(X_train, y_train)
Step 6: Model Evaluation
y_pred = model.predict(X_test)
print("MSE:", mean_squared_error(y_test, y_pred))
print("R2 Score:", r2_score(y_test, y_pred))
Step 7: Visualize Predictions
plt.scatter(y_test, y_pred)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted")
plt.show()
Output:
MSE (Mean Squared Error): Measures the average squared difference between actual and predicted.
R² Score: Indicates the goodness of fit (closer to 1 means better).
Final Thoughts:
Iris Project | House Price Prediction |
---|---|
Classification problem | Regression problem |
Predict categories (species) | Predict continuous value (price) |
Uses Logistic Regression, KNN, etc. | Uses Linear Regression, Ridge, etc. |