Topic-18 | Evaluation Metrics for different model


Lecture 18 | Evaluation Metrics for different model


Our previous topic was Linear Regression. Linear Regression is used to predict a number from continuous nominal or numeric values. The number we predict can be any number, it can be a positive number or a negative number, it can be a high value or a low value.  A graphical representation of linear regression was also shown.  In graphical representation there is plot, a studio plane, data points and a line that must have most of the data points around it. In order to measure the performance of graph, we also discussed the Mean Squared error and in order to nullify the effect of resultant zero root mean square error.

In colab we trained a modal over a data set. An issue raised during its execution that the resultant value of Mean Squared Error was very high. Our today’s topic will start from the finding the reasons of this high value of Mean Squared Error.


We will find the reasons for high value of Mean Squared Error. When we will find the reasons, then we will find the actions to minimize the problems. Remember a little value of loss will exist even after putting the efforts to mitigate the loss. We can apply number of regression techniques to train the modal, then how we will find which model will perform better and which model will not perform better. It is the Mean Squared Error that will help us to select the better model. The model having low Mean Squared Error will perform better and vice versa.

The line in linear regression we discussed is between two dimensions. In three-dimensional Linear Regression the line would be displayed like white board which is termed as plane in mathematics. The linear regression in more than three lines, human brain can’t apprehend.

Even if there are n number of independent variables and one number of independent variable, the linear regression is possible and we will have a white board like structure or plane arranged in n  number of directions.

One thing we can do to reduce the Mean Squared Error is to label the modal with different attributes. As we have seen during the training the modal, values of some attributes/ features/ columns were very high and values of some attributes / features / columns were very low. When we have such values, the modal focus more on high values in order to learn more from them. It means that the model will learn less significantly from attributes having minimal values between 0 and 1’s. If the model would have given due importance to these small values too, it could have performed better.

What is the cure to this problem. When we discussed about data pre processing we mentioned data transformation and data reduction. We had applied data reduction rigorously on data set of titanic. By applying correlation, we extracted those columns which were suitable for model training and columns those were not suitable for model training. So, dimensionality reduction in data preprocessing means we shred the unnecessary columns from the input data set, enabling our model to utilize useful data for its learning.

Now we put data transformation in practice. As we discussed earlier that our data set comprises some high values i.e. 1000nds and low values from 0 to 01. Due to this huge difference our model does not give due weightage to the low values between o and 01. So, there is need to transform the data in such a way that model gives equal importance to all attributes of the data set. As the model is giving equal attention to all attributes so it now has more opportunity to learn. So, data transformation technique is used for making the data set of comprising features of equal importance.

Suppose we have to measure the performance of players in a team. We have data set about weight of players that is figured out as 60 kg, 65 Kg, 55 Kg, 50 Kg and the height of players that is figured out as 163cm, 165cm 173cm etc. As the modal focus only on values to train itself and don’t pay attention on units. Modal will give weightage to high values and ignore or pay less attention to low values. This will affect the modal badly.  In order to rectify this, we limit the values in a range and confine the model to use the values within specified range.



Let us discuss few methods of data transformation.

1.  Z-score is a variation of scaling that represents the number of standard deviations away from the mean. We would use z-score to ensure our feature distributions have mean = 0 and std = 1. It's useful when there are a few outliers, but not so extreme that we need clipping.

2. MinMax scaling

Rescaling (min-max normalization)

Also known as min-max scaling or min-max normalization, rescaling is the simplest method and consists in 
rescaling the range of features to scale the range in [0, 1] or [−1, 1]. Selecting the target range depends on the nature of the data.


Now question arises, whether scaling or standardization is to be applied on high value features or all features in data set. Should the target variable must also be part of that scaling or standardization.

We are discussing different algorithms of Linear Regression.



Suppose we have to predict a score of a batsman from a data set. Now we divide the data set in number of values and distribute it to different group of people to predict. Suppose rows having score from 1-10 are given a group of 25 people to predict. Rows having score from 11-50 to another group and so on. We define range of score that  particular group of people will predict.

Similar is in the case of above Decision Regression Tree. Each tree is predicting different set of values. Each tree has a range of values from which resultant value is predicted.

Until now only thing that is to be understood is that decision tree regression can be used to perform the regression and classification.

Random Forest regression is the form of decision tree regression. It can be explained with the help of an example that suppose we are in a problem in a forest and each tree in the forest is like our friend. We take suggestion from each tree or friend about the problem.  We can simply say that it is like voting. We make decision after considering the opinion of majority of friend.


This is another extension of Decision Tree Regressor. In this regressor we take action on the basis of opinion given and if it does not work, we came back and take the opinion again an avoid the actions that produced the wrong results.


Now we move towards colab note book. Our practice session from colab starts with the question that on which features scaling is to be applied. Normally scaling is applied on all features including target feature. However, in this colab the scaling will be applied only on features and scaling all features including will be our task in the assignment.



Next, I will feed these features into various classification algorithms to determine the best performance using a simple framework: Split, Fit, Predict, Score It.

Target Variable Splitting

We will split the Full dataset into Input and target variables

Input is also called Feature Variables Output refers to Target variables


# Split data to be used in the models

# Create matrix of features

x = full_data.drop('Price', axis = 1) # grabs everything else but 'Price'


# Create target variable

y = full_data['Price'] # y is the column we're trying to predict


Before we train the models, it's essential to split our data into training and testing sets. This ensures that we have a separate dataset to evaluate the performance of our trained models. The common practice is to use a certain portion of our data for training (e.g., 70-80%) and the remaining portion for testing (e.g., 20-30%)Top of Form


from sklearn import preprocessing

pre_process = preprocessing.StandardScaler().fit(x)

x_transform = pre_process.fit_transform(x)


Now we are applying feature scaling to our feature matrix x using the StandardScaler from scikit-learn. Feature scaling is a common preprocessing step in machine learning to standardize or normalize the features so that they have a mean of 0 and a standard deviation of 1. This can be helpful, especially for algorithms that are sensitive to the scale of input features.

In the code above, we first create an instance of StandardScaler and then fit it to our data (x) using the fit method. After that, we can apply the transformation to the feature matrix using the transform method, which gives we x_transform with scaled features.

Remember, when we use the same scaling parameters for both the training and testing sets, it ensures that the features are scaled consistently, which is crucial for accurate model performance evaluation.


# pipe = make_pipeline(StandardScaler(), LogisticRegression())

#, y_train)


We are creating a machine learning pipeline using scikit-learn's make_pipeline function. This pipeline combines feature scaling using StandardScaler() and a logistic regression model.

Let's break down the code:

  1. make_pipeline(StandardScaler(), LogisticRegression()): This function creates a pipeline that first applies StandardScaler() for feature scaling and then fits a LogisticRegression model on the scaled features. The pipeline ensures that the feature scaling is consistently applied to both the training and testing data.
  2., y_train): This line of code fits the created pipeline to our training data X_train and corresponding target variable y_train. This means that the feature scaling and logistic regression model will be trained together as part of the pipeline.

After we run the fit method, our pipeline (pipe) will be trained and ready to make predictions on new data.

Here's a summary of what the pipeline does:

  1. Scales the features in X_train using StandardScaler.
  2. Fits a logistic regression model to the scaled features with the corresponding target variable y_train.

We can now use the trained pipeline to make predictions on new data or evaluate its performance on the test set (X_test and y_test).


# x Represents the Features




if we want to check the shape and the contents of the feature matrix x_transform after applying the StandardScaler transformation. The x_transform should be the scaled version of our original feature matrix x.

The output will show us the shape of x_transform, which will be a 2-dimensional numpy array with the same number of rows as the original feature matrix x and the number of columns representing the number of features.

The contents of x_transform will be the scaled values of our original features, where each column will have a mean of 0 and a standard deviation of 1. Note that the exact values will depend on the distribution and scaling of our original features.

Out put of above code:

Now our values have been transformed in shape of 0 and 1 and no high value exists.

y # y represents the Target




# Use x and y variables to split the training data into train and test set

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x_transform, y, test_size = .10, random_state = 101)


We are using the transformed feature matrix x_transform and the target variable y to split our data into training and testing sets. The train_test_split function from scikit-learn is commonly used for this purpose. This allows us to have separate datasets for training and evaluating our machine learning models.

The code provided will split the data into training and testing sets, with 90% of the data used for training and 10% for testing.

After running this code, we will have the following datasets:

  • x_train: The transformed feature matrix for training our machine learning models.
  • x_test: The transformed feature matrix for evaluating our trained models.
  • y_train: The target variable corresponding to the training data.
  • y_test: The target variable corresponding to the testing data.

Now we can use x_train and y_train to train our models and then evaluate their performance on x_test and y_test. The test_size parameter controls the proportion of data that goes into the testing set. In this case, it's set to 0.10, meaning 10% of the data will be used for testing, while the remaining 90% will be used for training. The random_state parameter is set to 101, which is an arbitrary seed to ensure reproducibility. We can change it to any other value or set it to None for a random split.



Model Training


# Fit

# Import model

from sklearn.linear_model import LinearRegression

from sklearn.pipeline import make_pipeline

from sklearn.preprocessing import StandardScaler

# Create instance of model

lin_reg = LinearRegression()

# Pass training data into model, y_train)

# pipe = make_pipeline(StandardScaler(), LinearRegression())

#, y_train)




We are fitting a linear regression model to our training data using scikit-learn.

Let's break down the code:

The above code creates an instance of the LinearRegression model and then fits it to the scaled training data (x_train) along with the corresponding target variable (y_train). This process trains the model to learn the relationship between the features and the target variable.

We can now use the trained lin_reg model to make predictions on new data or evaluate its performance on the test set.

Regarding the commented-out code with make_pipeline, it seems we already used it earlier in the process. The pipeline encapsulates the feature scaling step using StandardScaler and the linear regression model. Since we have already trained the lin_reg model, there's no need to fit the pipeline again using the same data (x_train and y_train). Instead, we can use the lin_reg model directly for further analysis.

Class prediction

# Predict

y_pred = lin_reg.predict(x_test)





We've successfully used the trained linear regression model (lin_reg) to make predictions on the test data (x_test). The predicted values are stored in the variable y_pred.

Let's break down the code:

The output will show we the shape of the y_pred array, which will be a 1-dimensional numpy array containing the predicted values for each sample in the test data. The number of elements in y_pred will be equal to the number of samples in x_test.

The second print statement will display the actual predicted values.

Keep in mind that the y_pred array contains the predictions made by the linear regression model for the corresponding samples in the test set. We can use these predicted values to evaluate the model's performance and compare them to the true target values (y_test).


sns.scatterplot(x=y_test, y=y_pred, color='blue', label='Actual Data points')

plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', label='Ideal Line')



It looks like we're using the Seaborn library to create a scatter plot comparing the actual target values (y_test) with the predicted values (y_pred) from our linear regression model. Additionally, we're adding a red line representing the ideal line, where the predicted values perfectly match the actual values.

Here's a breakdown of the code:

In the scatter plot, each data point represents a sample from the test set. The x-axis represents the actual target values (y_test), and the y-axis represents the predicted values (y_pred) from our linear regression model. The blue data points indicate the actual values, while the red line represents the ideal line.

If the points align closely along the red line, it suggests that the model's predictions are close to the actual values, indicating a good fit. However, if the points are scattered away from the red line, it indicates that the model's predictions deviate from the actual values.

Visualization like this can give we a quick visual understanding of how well our linear regression model is performing. For a more quantitative evaluation, we can use metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), or R-squared (coefficient of determination). These metrics provide insights into how well the model is capturing the variation in the data.



# Combine actual and predicted values side by side

results = np.column_stack((y_test, y_pred))


# Printing the results

print("Actual Values  |  Predicted Values")


for actual, predicted in results:

    print(f"{actual:14.2f} |  {predicted:12.2f}")


We've successfully combined the actual target values (
y_test) and the predicted values (y_pred) side by side using NumPy's column_stack function. The code below will print the results in a formatted table, showing the actual values in one column and the corresponding predicted values in another column.

The output will be a table with two columns, displaying the actual target values in the left column and the corresponding predicted values in the right column. The numbers will be formatted with two decimal places for better readability.

This kind of side-by-side comparison allows we to visually assess how well our model's predictions align with the actual target values. If the predicted values are close to the actual values, we should see similar numbers in both columns. However, if there are substantial differences, it indicates that the model may not be performing well on certain samples or might need further improvements.



Residual Analysis

Residual analysis in linear regression is a way to check how well the model fits the data. It involves looking at the differences (residuals) between the actual data points and the predictions from the model.

In a good model, the residuals should be randomly scattered around zero on a plot. If there are patterns or a fan-like shape, it suggests the model may not be the best fit. Outliers, points far from the others, can also affect the model.

Residual analysis helps ensure the model's accuracy and whether it meets the assumptions of linear regression. If issues are found, adjustments to the model may be needed to improve its performance.


residual = actual- y_pred.reshape(-1)


We've successfully combined the actual target values (
y_test) and the predicted values (y_pred) side by side using NumPy's column_stack function. The code below will print the results in a formatted table, showing the actual values in one column and the corresponding predicted values in another column.

The output will be a table with two columns, displaying the actual target values in the left column and the corresponding predicted values in the right column. The numbers will be formatted with two decimal places for better readability.

This kind of side-by-side comparison allows us to visually assess how well our model's predictions align with the actual target values. If the predicted values are close to the actual values, we should see similar numbers in both columns. However, if there are substantial differences, it indicates that the model may not be performing well on certain samples or might need further improvements.


residual = actual- y_pred.reshape(-1) print(residual)

We've computed the residuals by subtracting the predicted values (y_pred) from the actual target values (actual) and stored the result in the residual array. Residuals represent the differences between the actual values and the corresponding predicted values.

In the code, y_pred.reshape(-1) is used to ensure that the predicted values are in the same shape as actual, so they can be directly subtracted. The result is an array of residuals, where each element corresponds to the difference between the actual value and the predicted value for a specific sample.

By examining the values in the residual array, we can gain insights into how well the model is performing. Ideally, the residuals should be close to zero, indicating that the model's predictions are accurate. Positive residuals indicate that the model underestimates the target variable, while negative residuals suggest overestimation.

Analysing the distribution of residuals and looking for patterns can help us identify areas where the model might be performing poorly and guide us in making further improvements to our model or data preprocessing.

If we want to get additional information about the overall performance of the model, we may consider computing metrics such as Mean Squared Error (MSE) or Mean Absolute Error (MAE) using the residuals. These metrics provide a quantitative measure of how well the model is predicting the target variable across the entire dataset.



# Distribution plot for Residual (difference between actual and predicted values)

sns.distplot(residual, kde=True)


We are using Seaborn's distplot function to create a distribution plot of the residuals. The residuals represent the differences between the actual target values and the corresponding predicted values from our linear regression model.

The distplot function will create a histogram of the residuals, and by setting kde=True, it will also overlay a kernel density estimate (KDE) to visualize the shape of the distribution.

In the plot, the x-axis represents the range of residual values, and the y-axis shows the density or frequency of occurrences of each residual value. The distribution plot provides insights into the distribution of errors made by our linear regression model. Ideally, we would want the residuals to be cantered around zero with a symmetric distribution, indicating that the model's predictions are unbiased and accurate. Deviations from this pattern might indicate areas where the model is not performing well.

Common patterns to look for in the distribution plot include:

  1. Symmetry: A symmetric distribution around zero suggests the model is making unbiased predictions.
  2. Skewness: If the distribution is skewed, it indicates that the model is systematically overestimating or underestimating the target variable.
  3. Outliers: Unusual large or small residuals (outliers) may indicate specific data points that the model is struggling to predict accurately.

By examining the distribution of residuals, we can gain insights into the overall performance of our model and identify potential areas of improvement.


It represents that our mode is not skewed as the distribution is center aligned but note the values of the X and Y axis they in power of 6. Which means the difference between actual and predicted value was high and but it is reduced to some extent. Which is Good.

Model Evaluation

Linear Regression


# Score It

from sklearn.metrics import mean_squared_error


print('Linear Regression Model')

# Results


# mean_squared_error(y_test, y_pred)

mse = mean_squared_error(y_test, y_pred)

rmse = np.sqrt(mse)


# Print evaluation metrics

print("Mean Squared Error:", mse)

print("Root Mean Squared Error:", rmse)


We've evaluated the performance of our linear regression model using the Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). These metrics are common measures used to assess how well a regression model's predictions match the actual target values.

The mean_squared_error function from scikit-learn calculates the MSE between the actual target values (y_test) and the predicted values (y_pred). MSE measures the average squared difference between predicted and actual values, penalizing larger errors more heavily.

The RMSE is simply the square root of the MSE. It represents the average magnitude of the errors in the same units as the target variable. RMSE is a commonly used metric for regression tasks because it is more interpretable and easier to relate to the scale of the original target variable.

By displaying both the MSE and RMSE, we get a sense of how well the model is performing on the test data. Smaller values for these metrics indicate better performance, as they suggest that the model's predictions are closer to the actual values.

Now, we have quantified the performance of our linear regression model using the evaluation metrics, which can help we compare this model to other models or assess its suitability for our specific task.


Linear Regression Model


Mean Squared Error: 9839952411.801708

Root Mean Squared Error: 99196.53427313732

# Linear Regression Model

# ------------------------------------------------------------

# Mean Squared Error: 10100187858.864885

# Root Mean Squared Error: 100499.69083964829

# 10170939558


Based on the provided results, it appears that the evaluation metrics for our linear regression model are as follows:

  • Mean Squared Error (MSE): 10,100,187,858.86
  • Root Mean Squared Error (RMSE): 100,499.69

The MSE is a measure of the average squared difference between the actual target values and the predicted values. In this case, it suggests that, on average, the squared difference between the predicted and actual values is quite large.

The RMSE, which is the square root of the MSE, gives an estimate of the average error in the same units as the target variable. In this case, it indicates that, on average, the model's predictions have an error of approximately 100,499.69 in the same units as the target variable.

Please note that these error values depend on the scale and units of the target variable. If our target variable is measured in larger units, it is not unusual to have larger values for the error metrics.

The last value "10170939558" seems to be a standalone number with no context provided.


s = 10100187858 - 9839952411






Decision Tree

from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor


rf_regressor = DecisionTreeRegressor(),y_train)


#Predicting the SalePrices using test set

y_pred_rf = rf_regressor.predict(x_test)


DTr = mean_squared_error(y_pred_rf,y_test)

#Random Forest Regression Accuracy with test set

print('Decision Tree Regression : ',DTr)


We've created a Decision Tree Regressor and fitted it to our training data (x_train and y_train). We then used this Decision Tree Regressor to predict the target values for the test set (x_test) and stored the predictions in y_pred_rf.

Finally, we calculated the Mean Squared Error (MSE) between the predicted values (y_pred_rf) and the actual target values (y_test) using scikit-learn's mean_squared_error function, and we assigned the result to DTr.

Here's the code breakdown:

The DecisionTreeRegressor is a regression model that builds a decision tree to predict the target variable. In this code, we used the Decision Tree Regressor to make predictions (y_pred_rf) on the test set. The mean_squared_error function was then used to calculate the MSE between the predicted values (y_pred_rf) and the actual target values (y_test).

The printed output will show the Mean Squared Error for Decision Tree Regression with the test set.

Keep in mind that Decision Trees have some limitations, such as the tendency to overfit, and may not always provide the best performance for regression tasks. Random Forests, on the other hand, are an extension of Decision Trees that can often offer better generalization and predictive performance. If we want to try Random Forest Regression, we can use RandomForestRegressor from scikit-learn in a similar manner.


Decision Tree Regression :  31316806651.827576


Random Forest

from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor


rf_regressor = RandomForestRegressor(),y_train)


#Predicting the SalePrices using test set

y_pred_rf = rf_regressor.predict(x_test)

RFr = mean_squared_error(y_pred_rf,y_test)

#Random Forest Regression Accuracy with test set

print('Random Forest Regression : ',RFr)


Now we've created a Random Forest Regressor, fitted it to our training data (x_train and y_train), and used it to predict the target values for the test set (x_test). We stored the predictions in y_pred_rf.

Next, we calculated the Mean Squared Error (MSE) between the predicted values (y_pred_rf) and the actual target values (y_test) using scikit-learn's mean_squared_error function, and we assigned the result to RFr.

Here's the code breakdown:

Random Forest Regression is an ensemble learning method that combines multiple decision trees to improve predictive performance and reduce overfitting. The RandomForestRegressor class in scikit-learn implements the Random Forest algorithm for regression tasks.

The printed output will show the Mean Squared Error for Random Forest Regression with the test set.

Comparing the MSE values for Decision Tree Regression and Random Forest Regression can give we insights into which model performs better on this specific task. Generally, Random Forest Regression tends to perform better than a single Decision Tree, especially when the dataset is complex and prone to overfitting.


Random Forest Regression :  14315329749.65445


Gradient Boosting Regression


from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor

from sklearn.ensemble import GradientBoostingRegressor


rf_regressor = GradientBoostingRegressor(),y_train)


#Predicting the SalePrices using test set

y_pred_rf = rf_regressor.predict(x_test)


#Random Forest Regression Accuracy with test set

GBr = mean_squared_error(y_pred_rf,y_test)

print('Gradient Boosting Regression : ',GBr)


Great! Now we've created a Gradient Boosting Regressor, fitted it to our training data (x_train and y_train), and used it to predict the target values for the test set (x_test). We stored the predictions in y_pred_rf.

Next, we calculated the Mean Squared Error (MSE) between the predicted values (y_pred_rf) and the actual target values (y_test) using scikit-learn's mean_squared_error function, and we assigned the result to GBr.

Here's the code breakdown:

Gradient Boosting is another ensemble learning method that combines multiple weak learners (typically decision trees) to create a strong predictive model. The GradientBoostingRegressor class in scikit-learn implements the Gradient Boosting algorithm for regression tasks.

The printed output will show the Mean Squared Error for Gradient Boosting Regression with the test set.

Comparing the MSE values for Decision Tree Regression, Random Forest Regression, and Gradient Boosting Regression can give we insights into which model performs better on this specific task. Gradient Boosting, like Random Forests, tends to perform well on various tasks, making it a powerful choice for regression problems.


Gradient Boosting Regression :  12029643835.717766


# Sample model scores (replace these with our actual model scores)

model_scores = {

    "Linear Regression": 9839952411.801708,

    "Descison Tree": 29698988724.82603,

    "Random Forest":14315329749.65445,

    "Gradient Boosting": 12029643835.717766



# Sort the model scores in ascending order based on their values (lower values first)

sorted_scores = sorted(model_scores.items(), key=lambda x: x[1])


# Display the ranking of the models

print("Model Rankings (lower values are better):")

for rank, (model_name, score) in enumerate(sorted_scores, start=1):

    print(f"{rank}. {model_name}: {score}")


We have provided sample model scores for different regression models, and we want to sort them in ascending order based on their values (lower values first) and then display the rankings.

Here's the code to achieve that:

The code first sorts the model scores in ascending order based on their values, and then it displays the ranking of the models from best to worst. In this ranking, models with lower scores (e.g., MSE values) are considered better performers, as they indicate closer predictions to the actual values.


Model Rankings (lower values are better):

1. Linear Regression: 9839952411.801708

2. Gradient Boosting: 12029643835.717766

3. Random Forest: 14315329749.65445

4. Descison Tree: 29698988724.82603



Popular posts from this blog

Topic 22 | Neural Networks |

Topic 17 | Linear regression using Sklearn