Topic-18 | Evaluation Metrics for different model
Lecture 18 | Evaluation Metrics for different model
Our previous
topic was Linear Regression. Linear Regression is used to predict a number from
continuous nominal or numeric values. The number we predict can be any number,
it can be a positive number or a negative number, it can be a high value or a
low value. A graphical representation of
linear regression was also shown. In
graphical representation there is plot, a studio plane, data points and a line
that must have most of the data points around it. In order to measure the
performance of graph, we also discussed the Mean Squared error and in order to
nullify the effect of resultant zero root mean square error.
In colab
we trained a modal over a data set. An issue raised during its execution that
the resultant value of Mean Squared Error was very high. Our today’s topic will
start from the finding the reasons of this high value of Mean Squared Error.
|
We will
find the reasons for high value of Mean Squared Error. When we will find the reasons,
then we will find the actions to minimize the problems. Remember a little value
of loss will exist even after putting the efforts to mitigate the loss. We can
apply number of regression techniques to train the modal, then how we will find
which model will perform better and which model will not perform better. It is
the Mean Squared Error that will help us to select the better model. The model
having low Mean Squared Error will perform better and vice versa.
The line
in linear regression we discussed is between two dimensions. In three-dimensional
Linear Regression the line would be displayed like white board which is termed
as plane in mathematics. The linear regression in more than three lines, human
brain can’t apprehend.
Even if
there are n number of independent variables and one number of independent
variable, the linear regression is possible and we will have a white board like
structure or plane arranged in n number of
directions.
One thing
we can do to reduce the Mean Squared Error is to label the modal with different
attributes. As we have seen during the training the modal, values of some
attributes/ features/ columns were very high and values of some attributes /
features / columns were very low. When we have such values, the modal focus
more on high values in order to learn more from them. It means that the model
will learn less significantly from attributes having minimal values between 0
and 1’s. If the model would have given due importance to these small values too,
it could have performed better.
What is
the cure to this problem. When we discussed about data pre processing we
mentioned data transformation and data reduction. We had applied data reduction
rigorously on data set of titanic. By applying correlation, we extracted those columns
which were suitable for model training and columns those were not suitable for
model training. So, dimensionality reduction in data preprocessing means we
shred the unnecessary columns from the input data set, enabling our model to
utilize useful data for its learning.
Now we
put data transformation in practice. As we discussed earlier that our data set
comprises some high values i.e. 1000nds and low values from 0 to 01. Due to
this huge difference our model does not give due weightage to the low values
between o and 01. So, there is need to transform the data in such a way that
model gives equal importance to all attributes of the data set. As the model is
giving equal attention to all attributes so it now has more opportunity to
learn. So, data transformation technique is used for making the data set of
comprising features of equal importance.
Suppose
we have to measure the performance of players in a team. We have data set about
weight of players that is figured out as 60 kg, 65 Kg, 55 Kg, 50 Kg and the
height of players that is figured out as 163cm, 165cm 173cm etc. As the modal
focus only on values to train itself and don’t pay attention on units. Modal
will give weightage to high values and ignore or pay less attention to low
values. This will affect the modal badly.
In order to rectify this, we limit the values in a range and confine the
model to use the values within specified range.
|
Let us
discuss few methods of data transformation.
1.
Z-score
is a
variation of scaling that represents the number of standard deviations away
from the mean. We would use z-score to ensure our feature
distributions have mean = 0 and std = 1. It's useful when there are a few
outliers, but not so extreme that we need clipping.
2. MinMax
scaling
Rescaling (min-max normalization)
Also known as min-max scaling or min-max normalization, rescaling is the
simplest method and consists in rescaling the range of features to scale the range
in [0, 1] or [−1, 1]. Selecting the target range depends on the nature of the data.
Now question arises, whether scaling or standardization
is to be applied on high value features or all features in data set. Should the
target variable must also be part of that scaling or standardization.
We are discussing different algorithms of Linear
Regression.
|
|
Suppose
we have to predict a score of a batsman from a data set. Now we divide the data
set in number of values and distribute it to different group of people to
predict. Suppose rows having score from 1-10 are given a group of 25 people to
predict. Rows having score from 11-50 to another group and so on. We define
range of score that particular group of
people will predict.
Similar
is in the case of above Decision Regression Tree. Each tree is predicting
different set of values. Each tree has a range of values from which resultant
value is predicted.
Until now
only thing that is to be understood is that decision tree regression can be
used to perform the regression and classification.
|
Random
Forest regression is the form of decision tree regression. It can be explained
with the help of an example that suppose we are in a problem in a forest and each
tree in the forest is like our friend. We take suggestion from each tree or
friend about the problem. We can simply
say that it is like voting. We make decision after considering the opinion of
majority of friend.
|
This is
another extension of Decision Tree Regressor. In this regressor we take action
on the basis of opinion given and if it does not work, we came back and take
the opinion again an avoid the actions that produced the wrong results.
|
Now we
move towards colab note book. Our practice session from colab starts with the
question that on which features scaling is to be applied. Normally scaling is
applied on all features including target feature. However, in this colab the
scaling will be applied only on features and scaling all features including
will be our task in the assignment.
OBJECTIVE 2:
MACHINE LEARNING
Next, I
will feed these features into various classification algorithms to determine
the best performance using a simple framework: Split, Fit, Predict, Score
It.
Target Variable
Splitting
We will
split the Full dataset into Input and target variables
Input is
also called Feature Variables Output refers to
Target variables
#
Split data to be used in the models #
Create matrix of features x =
full_data.drop('Price', axis = 1) # grabs everything else
but 'Price' #
Create target variable y =
full_data['Price'] # y is the column we're
trying to predict |
Before we train the models, it's
essential to split our data into training and testing
sets. This ensures that we have a separate dataset to
evaluate the performance of our trained models. The common
practice is to use a certain portion of our data for training (e.g., 70-80%)
and the remaining portion for testing (e.g., 20-30%)
from sklearn import preprocessing pre_process
= preprocessing.StandardScaler().fit(x) x_transform
= pre_process.fit_transform(x) |
Now we are applying feature
scaling to our feature matrix x
using the StandardScaler
from scikit-learn. Feature scaling is a common
preprocessing step in machine learning to standardize or normalize the features
so that they have a mean of 0 and a standard deviation of 1. This can be
helpful, especially for algorithms that are sensitive to the scale of input
features.
In the code above, we first
create an instance of StandardScaler
and then fit it to our data (x
) using the fit
method. After that,
we can apply the transformation to the feature matrix using the transform
method, which gives
we x_transform
with scaled
features.
Remember, when we use the same scaling parameters
for both the training and testing sets, it ensures that the features are scaled
consistently, which is crucial for accurate model performance evaluation.
# pipe
= make_pipeline(StandardScaler(), LogisticRegression()) #
pipe.fit(X_train, y_train) |
We are creating a machine learning pipeline using
scikit-learn's make_pipeline function. This pipeline combines feature scaling using StandardScaler() and a logistic regression model.
Let's break down the code:
- make_pipeline(StandardScaler(),
LogisticRegression()): This function creates a pipeline that first
applies StandardScaler() for feature scaling and then fits a LogisticRegression model on the scaled features. The
pipeline ensures that the feature scaling is consistently applied to both
the training and testing data.
- pipe.fit(X_train, y_train): This line of code fits the created
pipeline to our training data X_train and corresponding target variable y_train. This means that the feature scaling
and logistic regression model will be trained together as part of the
pipeline.
After we run the fit method,
our pipeline (pipe) will be trained and ready to make predictions on new data.
Here's a summary of what
the pipeline does:
- Scales the features in X_train using StandardScaler.
- Fits a logistic regression model to the
scaled features with the corresponding target variable y_train.
We can now use the trained pipeline to make predictions on new data or
evaluate its performance on the test set (X_test and y_test).
# x
Represents the Features x_transform.shape x_transform |
if we want to check the shape
and the contents of the feature matrix x_transform after applying the StandardScaler transformation. The x_transform should be the scaled version of our original
feature matrix x.
The output will show us the shape of x_transform, which will be a 2-dimensional numpy array with the same number of rows
as the original feature matrix x and the number of columns representing the number of features.
The contents of x_transform will be the scaled values of our original
features, where each column will have a mean of 0 and a standard deviation of
1. Note that the exact values will depend on the distribution and scaling of our
original features.
Out put
of above code:
|
Now our
values have been transformed in shape of 0 and 1 and no high value exists.
y # y represents the Target y.shape (5000,) |
# Use
x and y variables to split the training data into train and test set from sklearn.model_selection import train_test_split x_train,
x_test, y_train, y_test = train_test_split(x_transform, y, test_size = .10, random_state = 101) |
We are using the transformed feature matrix x_transform and the target variable y to split our data into training and testing sets. The train_test_split function from scikit-learn is commonly used for
this purpose. This allows us to have separate datasets
for training and evaluating our machine learning models.
The code provided will
split the data into training and testing sets, with 90% of the data used for
training and 10% for testing.
After running this code, we
will have the following datasets:
- x_train: The transformed feature matrix for
training our machine learning models.
- x_test: The transformed feature matrix for
evaluating our trained models.
- y_train: The target variable corresponding to
the training data.
- y_test: The target variable corresponding to
the testing data.
Now we can use x_train and y_train to train our models and then evaluate their performance on x_test and y_test. The test_size parameter controls the proportion of data that
goes into the testing set. In this case, it's set to 0.10, meaning 10% of the
data will be used for testing, while the remaining 90% will be used for
training. The random_state parameter is set to 101, which is an arbitrary seed to ensure
reproducibility. We can change it to any other value or set it to None for a random split.
|
LINEAR
REGRESSION
Model Training
# Fit # Import model from sklearn.linear_model import LinearRegression from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler # Create instance of model lin_reg = LinearRegression() # Pass training data into model lin_reg.fit(x_train, y_train) # pipe = make_pipeline(StandardScaler(), LinearRegression()) # pipe.fit(x_train, y_train) |
LinearRegression LinearRegression() |
We are fitting a linear regression model to our
training data using scikit-learn.
Let's break down the code:
The above code creates an
instance of the LinearRegression model and then fits it to the scaled training data (x_train) along with the corresponding target variable (y_train). This process trains the model to learn the
relationship between the features and the target variable.
We can now use the trained lin_reg model to make predictions on new data or evaluate
its performance on the test set.
Regarding the commented-out
code with make_pipeline, it seems we already used
it earlier in the process. The pipeline encapsulates the feature scaling step
using StandardScaler and the linear regression
model. Since we have already trained the lin_reg model, there's no need to fit the pipeline again
using the same data (x_train and y_train). Instead, we can use the lin_reg model directly for further analysis.
Class
prediction
#
Predict y_pred
= lin_reg.predict(x_test) print(y_pred.shape) print(y_pred) |
We've
successfully used the trained linear regression model (lin_reg) to make predictions on the test data (x_test). The predicted values are stored in the variable y_pred.
Let's break down the code:
The output will show we the
shape of the y_pred array, which will be a 1-dimensional numpy array containing the
predicted values for each sample in the test data. The number of elements in y_pred will be equal to the number of samples in x_test.
The second print statement
will display the actual predicted values.
Keep in mind that the y_pred array contains the predictions made by the linear
regression model for the corresponding samples in the test set. We can use
these predicted values to evaluate the model's performance and compare them to
the true target values (y_test).
|
sns.scatterplot(x=y_test,
y=y_pred, color='blue', label='Actual Data points') plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', label='Ideal Line') plt.legend() plt.show() |
It looks
like we're using the Seaborn library to create a scatter plot comparing the
actual target values (y_test) with the predicted values (y_pred) from our linear regression model. Additionally, we're
adding a red line representing the ideal line, where the predicted values
perfectly match the actual values.
Here's a breakdown of the
code:
In the scatter plot, each
data point represents a sample from the test set. The x-axis represents the
actual target values (y_test), and the y-axis represents the predicted values (y_pred) from our linear regression model. The blue data
points indicate the actual values, while the red line represents the ideal
line.
If the points align closely
along the red line, it suggests that the model's predictions are close to the
actual values, indicating a good fit. However, if the points are scattered away
from the red line, it indicates that the model's predictions deviate from the
actual values.
Visualization like this can
give we a quick visual understanding of how well our linear regression model is
performing. For a more quantitative evaluation, we can use metrics such as Mean
Squared Error (MSE), Mean Absolute Error (MAE), or R-squared (coefficient of
determination). These metrics provide insights into how well the model is
capturing the variation in the data.
|
#
Combine actual and predicted values side by side results
= np.column_stack((y_test, y_pred)) #
Printing the results print("Actual Values
| Predicted Values") print("-----------------------------") for actual, predicted in results:
print(f"{actual:14.2f} | {predicted:12.2f}") |
We've successfully combined the actual target values (y_test) and the predicted values (y_pred) side by side using NumPy's column_stack function. The code below will print the results in
a formatted table, showing the actual values in one column and the
corresponding predicted values in another column.
The output will be a table
with two columns, displaying the actual target values in the left column and
the corresponding predicted values in the right column. The numbers will be
formatted with two decimal places for better readability.
This kind of side-by-side
comparison allows we to visually assess how well our model's predictions align
with the actual target values. If the predicted values are close to the actual
values, we should see similar numbers in both columns. However, if there are
substantial differences, it indicates that the model may not be performing well
on certain samples or might need further improvements.
|
Residual
Analysis
Residual analysis in linear regression is a way to check how
well the model fits the data. It involves looking at the differences
(residuals) between the actual data points and the predictions from the model.
In a good model, the residuals should be randomly scattered
around zero on a plot. If there are patterns or a fan-like shape, it suggests
the model may not be the best fit. Outliers, points far from the others, can
also affect the model.
Residual analysis helps ensure the model's accuracy and whether
it meets the assumptions of linear regression. If issues are found, adjustments
to the model may be needed to improve its performance.
residual
= actual- y_pred.reshape(-1) print(residual) |
We've successfully combined the actual target values (y_test) and the predicted values (y_pred) side
by side using NumPy's column_stack function. The code below will print the
results in a formatted table, showing the actual values in one column and the
corresponding predicted values in another column.
The output will be a table with two columns, displaying the actual
target values in the left column and the corresponding predicted values in the
right column. The numbers will be formatted with two decimal places for better
readability.
This kind of side-by-side comparison allows us to visually assess how well our model's
predictions align with the actual target values. If the predicted values are
close to the actual values, we should see similar numbers in both columns.
However, if there are substantial differences, it indicates that the model may
not be performing well on certain samples or might need further improvements.
residual = actual- y_pred.reshape(-1)
print(residual)
We've computed the residuals by subtracting
the predicted values (y_pred) from the actual target values (actual) and stored the result in the residual array.
Residuals represent the differences between the actual values and the
corresponding predicted values.
In the code, y_pred.reshape(-1) is used to ensure that the predicted values
are in the same shape as actual, so they can be directly subtracted. The
result is an array of residuals, where each element corresponds to the
difference between the actual value and the predicted value for a specific
sample.
By examining the values in the residual array, we
can gain insights into how well the model is performing. Ideally, the residuals
should be close to zero, indicating that the model's predictions are accurate.
Positive residuals indicate that the model underestimates the target variable,
while negative residuals suggest overestimation.
Analysing the distribution of residuals and looking for patterns can
help us identify areas where the model might be performing poorly and guide us in making further
improvements to our model or data preprocessing.
If we want to get additional information about the overall performance
of the model, we may consider computing metrics such as Mean Squared Error
(MSE) or Mean Absolute Error (MAE) using the residuals. These metrics provide a
quantitative measure of how well the model is predicting the target variable
across the entire dataset.
|
#
Distribution plot for Residual (difference between actual and predicted
values) sns.distplot(residual,
kde=True) |
We are
using Seaborn's distplot function to create a distribution plot of the residuals. The residuals
represent the differences between the actual target values and the
corresponding predicted values from our linear regression model.
The distplot function will create a histogram of the residuals,
and by setting kde=True, it will also overlay a kernel density estimate (KDE) to visualize the
shape of the distribution.
In the plot, the x-axis
represents the range of residual values, and the y-axis shows the density or
frequency of occurrences of each residual value. The distribution plot provides
insights into the distribution of errors made by our linear regression model.
Ideally, we would want the residuals to be cantered around zero with a
symmetric distribution, indicating that the model's predictions are unbiased
and accurate. Deviations from this pattern might indicate areas where the model
is not performing well.
Common patterns to look for
in the distribution plot include:
- Symmetry: A symmetric distribution
around zero suggests the model is making unbiased predictions.
- Skewness: If the distribution is skewed,
it indicates that the model is systematically overestimating or
underestimating the target variable.
- Outliers: Unusual large or small
residuals (outliers) may indicate specific data points that the model is
struggling to predict accurately.
By examining the
distribution of residuals, we can gain insights into the overall performance of
our model and identify potential areas of improvement.
|
It represents that our mode is not skewed as
the distribution is center aligned but note the values of the X and Y axis they
in power of 6. Which means the difference between actual and predicted value
was high and but it is reduced to some extent. Which is Good.
Model Evaluation
Linear Regression
#
Score It from sklearn.metrics import mean_squared_error print('Linear Regression Model') #
Results print('--'*30) #
mean_squared_error(y_test, y_pred) mse =
mean_squared_error(y_test, y_pred) rmse =
np.sqrt(mse) #
Print evaluation metrics print("Mean Squared
Error:", mse) print("Root Mean Squared
Error:",
rmse) |
We've evaluated the performance of our linear regression model using the Mean
Squared Error (MSE) and Root Mean Squared Error (RMSE). These metrics are
common measures used to assess how well a regression model's predictions match
the actual target values.
The mean_squared_error function from scikit-learn calculates the MSE
between the actual target values (y_test) and the predicted values (y_pred). MSE measures the average squared difference
between predicted and actual values, penalizing larger errors more heavily.
The RMSE is simply the
square root of the MSE. It represents the average magnitude of the errors in
the same units as the target variable. RMSE is a commonly used metric for
regression tasks because it is more interpretable and easier to relate to the
scale of the original target variable.
By displaying both the MSE
and RMSE, we get a sense of how well the model is performing on the test data.
Smaller values for these metrics indicate better performance, as they suggest
that the model's predictions are closer to the actual values.
Now, we have quantified the
performance of our linear regression model using the evaluation metrics, which
can help we compare this model to other models or assess its suitability for our
specific task.
Linear
Regression Model ------------------------------------------------------------ Mean
Squared Error: 9839952411.801708 Root
Mean Squared Error: 99196.53427313732 |
#
Linear Regression Model # ------------------------------------------------------------ # Mean
Squared Error: 10100187858.864885 # Root
Mean Squared Error: 100499.69083964829 #
10170939558 |
Based on
the provided results, it appears that the evaluation metrics for our linear
regression model are as follows:
- Mean Squared Error (MSE):
10,100,187,858.86
- Root Mean Squared Error (RMSE):
100,499.69
The MSE is a measure of the
average squared difference between the actual target values and the predicted
values. In this case, it suggests that, on average, the squared difference
between the predicted and actual values is quite large.
The RMSE, which is the
square root of the MSE, gives an estimate of the average error in the same
units as the target variable. In this case, it indicates that, on average, the
model's predictions have an error of approximately 100,499.69 in the same units
as the target variable.
Please note that these
error values depend on the scale and units of the target variable. If our
target variable is measured in larger units, it is not unusual to have larger
values for the error metrics.
The last value
"10170939558" seems to be a standalone number with no context
provided.
s = 10100187858 - 9839952411 print(s) |
260235447 |
y_train.shape |
(4500,) |
Decision Tree
from sklearn.tree import DecisionTreeRegressor from sklearn.ensemble import RandomForestRegressor rf_regressor
= DecisionTreeRegressor() rf_regressor.fit(x_train,y_train) #Predicting
the SalePrices using test set y_pred_rf
= rf_regressor.predict(x_test) DTr =
mean_squared_error(y_pred_rf,y_test) #Random
Forest Regression Accuracy with test set print('Decision Tree Regression
: ',DTr) |
We've created a Decision Tree Regressor and fitted
it to our training data (x_train and y_train). We then used this Decision Tree Regressor to predict the target
values for the test set (x_test) and stored the predictions in y_pred_rf.
Finally, we calculated the
Mean Squared Error (MSE) between the predicted values (y_pred_rf) and the actual target values (y_test) using scikit-learn's mean_squared_error function, and we assigned the result to DTr.
Here's the code breakdown:
The DecisionTreeRegressor is a regression model that builds a decision tree
to predict the target variable. In this code, we used the Decision Tree
Regressor to make predictions (y_pred_rf) on the test set. The mean_squared_error function was then used to calculate the MSE between
the predicted values (y_pred_rf) and the actual target values (y_test).
The printed output will
show the Mean Squared Error for Decision Tree Regression with the test set.
Keep in mind that Decision
Trees have some limitations, such as the tendency to overfit, and may not
always provide the best performance for regression tasks. Random Forests, on
the other hand, are an extension of Decision Trees that can often offer better
generalization and predictive performance. If we want to try Random Forest
Regression, we can use RandomForestRegressor from scikit-learn in a similar manner.
Decision Tree Regression : 31316806651.827576 |
Random Forest
from sklearn.tree import DecisionTreeRegressor from sklearn.ensemble import RandomForestRegressor rf_regressor
= RandomForestRegressor() rf_regressor.fit(x_train,y_train) #Predicting
the SalePrices using test set y_pred_rf
= rf_regressor.predict(x_test) RFr =
mean_squared_error(y_pred_rf,y_test) #Random
Forest Regression Accuracy with test set print('Random Forest Regression
: ',RFr) |
Now we've
created a Random Forest Regressor, fitted it to our training data (x_train and y_train), and used it to predict the target values for the
test set (x_test). We stored the
predictions in y_pred_rf.
Next, we calculated the
Mean Squared Error (MSE) between the predicted values (y_pred_rf) and the actual target values (y_test) using scikit-learn's mean_squared_error function, and we assigned the result to RFr.
Here's the code breakdown:
Random Forest Regression is
an ensemble learning method that combines multiple decision trees to improve
predictive performance and reduce overfitting. The RandomForestRegressor class in scikit-learn implements the Random Forest
algorithm for regression tasks.
The printed output will
show the Mean Squared Error for Random Forest Regression with the test set.
Comparing the MSE values
for Decision Tree Regression and Random Forest Regression can give we insights
into which model performs better on this specific task. Generally, Random
Forest Regression tends to perform better than a single Decision Tree,
especially when the dataset is complex and prone to overfitting.
Random Forest Regression :
14315329749.65445 |
Gradient Boosting
Regression
from sklearn.tree import DecisionTreeRegressor from sklearn.ensemble import RandomForestRegressor from sklearn.ensemble import
GradientBoostingRegressor rf_regressor
= GradientBoostingRegressor() rf_regressor.fit(x_train,y_train) #Predicting
the SalePrices using test set y_pred_rf
= rf_regressor.predict(x_test) #Random
Forest Regression Accuracy with test set GBr =
mean_squared_error(y_pred_rf,y_test) print('Gradient Boosting
Regression : ',GBr) |
Great!
Now we've created a Gradient Boosting Regressor, fitted it to our training data
(x_train and y_train), and used it to predict the target values for the
test set (x_test). We stored the
predictions in y_pred_rf.
Next, we calculated the
Mean Squared Error (MSE) between the predicted values (y_pred_rf) and the actual target values (y_test) using scikit-learn's mean_squared_error function, and we assigned the result to GBr.
Here's the code breakdown:
Gradient Boosting is another
ensemble learning method that combines multiple weak learners (typically
decision trees) to create a strong predictive model. The GradientBoostingRegressor class in scikit-learn
implements the Gradient Boosting algorithm for regression tasks.
The printed output will
show the Mean Squared Error for Gradient Boosting Regression with the test set.
Comparing the MSE values
for Decision Tree Regression, Random Forest Regression, and Gradient Boosting
Regression can give we insights into which model performs better on this
specific task. Gradient Boosting, like Random Forests, tends to perform well on
various tasks, making it a powerful choice for regression problems.
Gradient Boosting Regression : 12029643835.717766 |
#
Sample model scores (replace these with our actual model scores) model_scores
= {
"Linear
Regression": 9839952411.801708,
"Descison
Tree": 29698988724.82603,
"Random
Forest":14315329749.65445,
"Gradient
Boosting": 12029643835.717766 } # Sort
the model scores in ascending order based on their values (lower values
first) sorted_scores
= sorted(model_scores.items(),
key=lambda x: x[1]) #
Display the ranking of the models print("Model Rankings
(lower values are better):") for rank, (model_name,
score) in enumerate(sorted_scores, start=1):
print(f"{rank}. {model_name}: {score}") |
We have provided sample model scores for different
regression models, and we want to sort them in ascending order based on their
values (lower values first) and then display the rankings.
Here's the code to achieve
that:
The code first sorts the
model scores in ascending order based on their values, and then it displays the
ranking of the models from best to worst. In this ranking, models with lower
scores (e.g., MSE values) are considered better performers, as they indicate closer
predictions to the actual values.
Model
Rankings (lower values are better): 1.
Linear Regression: 9839952411.801708 2.
Gradient Boosting: 12029643835.717766 3.
Random Forest: 14315329749.65445 4.
Descison Tree: 29698988724.82603 |
Comments
Post a Comment