Day 19: Machine Learning with Scikit-Learn

Let's dive deep into Day 19: Machine Learning with Scikit-Learn! This day introduces you to the basics of machine learning, and we'll go through each concept interactively. The exercises will help you practice and solidify your understanding.

Topics:

Introduction to Machine Learning
Basic Models: Linear Regression and Decision Trees
Train-Test Split
Evaluating Models

We'll be using scikit-learn (a popular Python library) to implement machine learning algorithms.

1. Introduction to Machine Learning

What is Machine Learning?
Machine learning allows computers to learn from data and make decisions based on that data, without being explicitly programmed. There are different types of machine learning:

Supervised Learning: The model is trained on labeled data (input-output pairs). Examples: Classification, Regression.
Unsupervised Learning: The model is trained on unlabeled data to find hidden patterns. Examples: Clustering, Dimensionality reduction.
Reinforcement Learning: The model learns by interacting with an environment and receiving feedback.

For today, we will focus on supervised learning, which includes two main tasks:

Regression: Predicting a continuous value (e.g., predicting house prices).
Classification: Predicting a discrete class (e.g., classifying spam emails).

2. Linear Regression Model

Linear regression is a simple model that predicts a continuous target variable based on one or more input features. The model assumes a linear relationship between the inputs and the target.

import numpy as np

import { LinearRegression } from 'sklearn.linear_model'

import { train_test_split } from 'sklearn.model_selection'

import { mean_squared_error } from 'sklearn.metrics'



// Generating random data for demonstration

np.random.seed(42)

X = np.random.rand(100, 1) * 10  // Feature: random numbers between 0 and 10

y = 2.5 * X + 1.5 + np.random.randn(100, 1)  // Target: linear relationship with some noise



// Split the data into training and testing sets (80% train, 20% test)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)



// Initialize the Linear Regression model

model = new LinearRegression()



// Train the model

model.fit(X_train, y_train)



// Make predictions on the test set

y_pred = model.predict(X_test)



// Evaluate the model using Mean Squared Error (MSE)

mse = mean_squared_error(y_test, y_pred)

console.log(f"Mean Squared Error: {mse}")

Key Concepts in the Code:

train_test_split: Splits the dataset into training and testing sets.
LinearRegression: The linear regression model.
fit: Trains the model on the data.
predict: Makes predictions using the trained model.
mean_squared_error: Evaluates the model by measuring the difference between predicted and actual values.

3. Decision Tree Model

A decision tree is another model used for both regression and classification. It splits the data into branches based on feature values, and at each node, it makes a decision.

from sklearn.tree import DecisionTreeRegressor

from sklearn.metrics import mean_squared_error



// Initialize the Decision Tree Regressor model

tree_model = new DecisionTreeRegressor({ random_state: 42 })



// Train the model

tree_model.fit(X_train, y_train)



// Make predictions

y_tree_pred = tree_model.predict(X_test)



// Evaluate the model using Mean Squared Error (MSE)

tree_mse = mean_squared_error(y_test, y_tree_pred)

console.log(f"Decision Tree Mean Squared Error: {tree_mse}")

4. Train-Test Split

Before we can evaluate any model, it's important to split the dataset into two parts:

Training data: Used to train the model.
Testing data: Used to evaluate how well the model generalizes to new data.

from sklearn.model_selection import train_test_split



// Assuming we have a dataset X (features) and y (target)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)



console.log("Training data size:", len(X_train))

console.log("Testing data size:", len(X_test))

5. Evaluating Models

Evaluating models is essential to determine how well they perform on unseen data. For regression tasks, some common evaluation metrics include:

Mean Squared Error (MSE)
R-Squared (R²): This measures how well the model explains the variation in the target variable.

// Evaluate the model using R-Squared

r2 = model.score(X_test, y_test)

console.log(f"R-Squared: {r2}")

Summary

In Day 19, you learned how to implement:

Linear Regression to model linear relationships.
Decision Tree for modeling non-linear relationships.
Train-Test Split for model evaluation.
Model Evaluation using MSE and R².

Feel free to experiment with different datasets, hyperparameters, and evaluation metrics. Machine learning is a vast field, and each step opens up many opportunities to learn more!

Need Help?

If you have any questions or want to explore a particular topic in more detail, let me know! Would you like to try the exercises or need help with the code?

HTML

Select a Subtopic