TOPIC 20 | Model Evaluation Metrics

-`` 

AI Free Basic Course | Lecture 20 | Model Evaluation Metrics

 Let us start the lecture with the topic of accuracy, the topic which is very familiar to the audience. If the accuracy is high, we assume that the model is performing good and vice versa. Before we proceed further, it is pertinent to mention that the accuracy is applied to classification and not applicable to the regression. So today, discussion will cover only the classification model. We will evaluate classification model. What does evaluation mean, for example we take quiz after teaching just to evaluate that you are understanding what we are teaching. This is student evaluation, same is in the case of AI model. We evaluate the model after training it. There are different models of training the model, few of which we have discussed earlier and remaining are yet to be discussed. For example, we have discussed logistic regression, classification, random forest, gradient boosting and decision tree. After making for example 4- models, we evaluate the performance of model though accuracy.

Accuracy is the valid measure to evaluate the classification models. However, sometimes, accuracy is not considered as valid measure of accuracy. For example, we have designed a classifier algorithm and it always produces the positive results. Let us make it inverse, suppose we have designed a classifier algorithm which always produces negative results. So, the model trained through such algorithm does not qualify the definition of a model trained through machine learning.

If your input data comprises two classes. 90% data of input data is positive and 10% data is negative. It means that model is speaking truth 90 times, it means that accuracy of the model is 90%? Can we say that it is an excellent model?

For example, our input data consist of 90 healthy persons and 10 ill persons. We have trained our model on the basis of this data. If the model predicts on the basis of these 90 healthy persons and gives out put as healthy person, so we can say that the accuracy of the model is 100%.

If the model also includes some data from ill persons and the accuracy of the model becomes 90%. Can we say that it is a good model. Remember that the model can either be good or not good, but we should never call it a bad model. This type of input data which comprises 90% healthy and 10% ill persons is called imbalance data.

The measure of accuracy that is used to evaluate the classification model would fail if the input dataset is imbalance. If this method of evaluating accuracy fails, what are alternatives available to us for evaluating the accuracy???

The difference between imbalance and skewedness is that when your variable or feature is categorical the difference is called imbalance and skewness is used in numerical data.

What is balance data means than? It is not always 50/50. It depends upon the what result you are going to drive. What is impact of imbalance data on these results we are going to drive. So even in 70/30, 40/50, 50/40 data can be as balance data. If our focus is covid positive cases which are in less percentage than covid negative cases, we will measure balancing in the data considering the significance of input data which can be different from 50/50 data.

When the input data is in imbalance  than accuracy can be a dangerous measure.

We will study the following evaluation methods in case when accuracy is not desired measure. The methods of evaluation the classification are called classification matrics.

 We would start from the simple example and basic concept. Example of binary classification is Smoker / Non-Smoker, covid positive and covid negative. We are going to classify such variables which only have two categories or labels such as cat and not cat. This is called binary classification. We are starting from binary classification and will gradually move towards multi classification.

In binary classification following four items are important. In classification one is actual label and other is predicted label. Suppose we have an image of cat which is labelled as cat, this is actual label. When this image is given to the model and model predicts it as cat, this is predicted label. Evaluation of performance of model is done by comparing the actual data with the predicted data and their accuracy.

Now the model has to predict the image actually labelled as cat. We are interested that model should predict it as image of cat. So the positive class is image of cat.

True Positive

·       When the model predicts the actual image of cat as cat and we are interested in image of cat This scenario is called True Positive.

True Negative

·       When the model predicts the actual image of Non-cat as Non-cat. This scenario is called True Negative.

Above two scenarios are the desired ones from the model.

False Positive.

·       When the model predicts the actual image of Non-cat as cat. This scenario is called False Positive.

False negative.

When the model predicts the actual image of cat as Non-cat. This scenario is called False negative.

Let us relate False Positive with a real-life example of fraud detection. We are assigned with the task of designing a model on fraud detection. After making the model with good accuracy, it is informed with the occurrence of fraud transaction. When we evaluate the model we observed that there was no fraud. It was revealed that the model relied on accuracy only and without deep evaluation relayed alert of fraud. So it is False Positive when the model predicted what was not actually occurred and the model is giving the fraud alarm. Another example is of ringing of fire alarm in absence of any fire.

Another example is about False Negative. When a model predicts with the diagnosis the patient with no cancer when he is actually cancer patient. So we need to evaluate the model with the question that how many time a model responds in such a False Negative way. Our focus must on correct diagnosis of the disease.


 


In the figure given below, we have plotted our actual values along x-axis and predicted values along y-axis.

CONFUSION MATRIX

Rule of thumb to understand the scenario is that:

·       If the model is predicting the Actual Negative as positive it is False Positive and

·       If the model is predicting the Actual Positive as Negative it is called False Negative.

·       If positive actual and prediction match it is True Positive and

·       If Negative actual and Prediction match it True Negative.

Actual class was positive and the class predicted by the model also was positive.

It falls in True Positive category.


Actual class was positive and the class predicted by the model was negative.

It falls in False Negative category.


Actual class was negative and the class predicted by the model was a positive.

It falls in False Positive category.


Actual class was negative and the class predicted by the model was a negative.

It falls in True  Negative category.

 There are 165 test samples in the example given above. As you know when we are training the model we are separating the train and test data. These 165 samples here representing the test data that were given to model and results are shown above in the form of confusion matrix. Now we would not analyze using simple accuracy techniques for example  60% are correct predictions. We will evaluate it using the confusion matrix as per its classes.

CONFUSION MATRIX

There were 100 positive cases in the 165 rows and the model predicted 100 cases as Positive.

If positive actual and prediction match it is True Positive

So the result is True Positive.

There were 50 Negative cases in the 165 rows and the model predicted all 50 cases as Negative.

If Negative actual and prediction match it is True Negative

So the result is True Negative.

There were 100 Positive cases in the 165 rows and the model predicted 05 Positive cases as Negative.

If the model is predicting the Actual Positive as Negative it is called False Negative. So, these 5 cases are False Negative.


There were 50 Negative cases in the 165 rows and the model predicted 10 Negative cases as Positive.

If the model is predicting the Actual Negative as positive it is False Positive So, these 10 Negative cases are False Positive.



So we connect this scenario with our accuracy formulae discussed above. Now after studying the confusion matrix we will add all True Positive and all True Negative with total number of predictions.

Precision is True Positive divided by True Positive and False Positive. It means precision tells how many positive prediction done by the model is actually true. 


Let us relate it with our fraud detection example discussed above. False Positive was the result as the model was detecting fraud in absence of fraud. It is not desirable situation as it creates panic and service provided is not considered as correct. Our purpose is to remove such categories which fall in false positive category. So in case the accuracy of our model is 96% and we also need to calculate its precision. Precision tells us how to minimize the False Positive category. Greater the value of precision greater are the chances that we are in the safe zone to deliver the model for desired performance.

This can be understood with another simple method of calculation. In case the false positive is zero our precision would be 100% or 1. As far as the precision score trickles down , it reveals the existence of  False Positive Value in the model. If the precision score is 50, it means that the model predicts 50% positively and the model is not good for positive class prediction. So the precision tell about the capability of model to predict the positive class.


Remember both Recall and precision are interested in positive class. As we have framed our classification model to identify our positive class. False negatives are the actual positive values which are predicted as negative by the model. For example, the model is given 10 x actual images of cat and we have to assess how many images were predicted by the model as cat. The model predicted 8 x images as cat which is True Positive and 2 x images could not be predicted as cat by the model which is False Negative. It means modal recalled 80% information in this case. Remember the difference between Precision and Re-call. In Re-call we are interested in prediction power of the model. In Recall we are assessing the model on the basis of actual positive class.

As the True Positive and True Negative of the above data have high values we can say that the data would perform well just having a glance over the data or have low False Positive or False Negative values. It is the beauty of confusion matrix that you can assess about the performance of model juat looking at the matrix.

from sklearn.metrics import precision_score, recall_score, confusion_matrix

# Calculate precision and recall

precision = precision_score(y_test, y_pred_log_reg)

recall = recall_score(y_test, y_pred_log_reg)

 

# Print the results

print(f'Precision: {precision:.2f}')

print(f'Recall: {recall:.2f}')

print("--"*30)

# Calculate confusion matrix

confusion = confusion_matrix(y_test, y_pred_log_reg)

print(confusion)

sns.heatmap(confusion, annot=True, fmt="d")

 


The confusion_matrix function computes a confusion matrix for the predicted labels (y_pred_log_reg) compared to the true labels (y_test). A confusion matrix is a table that describes the performance of a classification model by showing the counts of true positive, true negative, false positive, and false negative predictions.
In the confusion matrix shown above, The values of True Positive (99) and True Negative (47) are higher and False Negative (8) and False Negative Values (24) are lower. So it apparently looks a good model
2. Decision Tree

Model Training

from sklearn.tree import DecisionTreeClassifier

 

print('Decision Tree Classifier')

# Create instance of model

Dtree = DecisionTreeClassifier()

 

# Pass training data into model

Dtree.fit(x_train, y_train)

 

This line fits (trains) the decision tree classifier model using the training data (x_train as the input features and y_train as the corresponding target labels). The fit method is used to train the model on the provided training data.
Model Evaluation

from sklearn.metrics import accuracy_score

# prediction from the model

y_pred_Dtree = Dtree.predict(x_test)

# Score It

 

print('Decision Tree Classifier')

# Accuracy

print('--'*30)

Dtree_accuracy = round(accuracy_score(y_test, y_pred_Dtree) * 100,2)

print('Accuracy', Dtree_accuracy,'%')

Decision Tree Classifier

------------------------------------------------------------

Accuracy 79.21 %

The code you provided calculates the accuracy of the Decision Tree Classifier on a test dataset and prints the result. Here's a breakdown of what the code does:
2. Make predictions using the trained Decision Tree Classifier:

from sklearn.metrics import precision_score, recall_score, confusion_matrix

# Calculate precision and recall

precision = precision_score(y_test, y_pred_Dtree)

recall = recall_score(y_test, y_pred_Dtree)

Dtree_accuracy = round(accuracy_score(y_test, y_pred_Dtree) * 100,2)

 

# Print the results

print(f'Precision: {precision:.2f}')

print(f'Recall: {recall:.2f}')

 

print("--"*30)

# Calculate confusion matrix

confusion = confusion_matrix(y_test, y_pred_Dtree)

 

sns.heatmap(confusion, annot=True, fmt="d")

5. Calculate the confusion matrix:

Precision: 0.72

Recall: 0.77

3. Random Forest

Model Training

from sklearn.ensemble import RandomForestClassifier

 

print('Random Forest Classifier')

# Create instance of model

rfc = RandomForestClassifier()

 

# Pass training data into model

rfc.fit(x_train, y_train)

 

Random Forest Classifier

RandomForestClassifier

RandomForestClassifier()

 

The code you provided demonstrates the use of scikit-learn's RandomForestClassifier for creating and training a random forest classifier. Here's a breakdown of what the code does:
3. Create an instance of the RandomForestClassifier:

Model Evaluation

from sklearn.metrics import accuracy_score

# prediction from the model

y_pred_rfc = rfc.predict(x_test)

# Score It

 

print ('Random Forest Classifier')

# Accuracy

print('--'*30)

rfc_accuracy = round(accuracy_score(y_test, y_pred_rfc) * 100,2)

print('Accuracy', rfc_accuracy,'%')

 

Random Forest Classifier

------------------------------------------------------------

Accuracy 82.02 %


from sklearn.metrics import precision_score, recall_score, confusion_matrix

# Calculate precision and recall

precision = precision_score(y_test, y_pred_rfc)

recall = recall_score(y_test, y_pred_rfc)

 

# Print the results

print(f'Precision: {precision:.2f}')

print(f'Recall: {recall:.2f}')

print("--"*30)

# Calculate confusion matrix

confusion = confusion_matrix(y_test, y_pred_rfc)

sns.heatmap(confusion, annot=True, fmt="d")

 

Precision: 0.79

Recall: 0.75



4. Gradient Bossting Classifier

Model Training

from sklearn.ensemble import GradientBoostingClassifier

 

print('Gradient Boosting Classifier')

# Create instance of model

gbc = GradientBoostingClassifier()

 

# Pass training data into model

gbc.fit(x_train, y_train)

 

Gradient Boosting Classifier

GradientBoostingClassifier

GradientBoostingClassifier()

 



Model Evalution

from sklearn.metrics import accuracy_score

# prediction from the model

y_pred_gbc = gbc.predict(x_test)

# Score It

 

print('Gradient Boosting Classifier')

# Accuracy

print('--'*30)

gbc_accuracy = round(accuracy_score(y_test, y_pred_gbc) * 100,2)

print('Accuracy', gbc_accuracy,'%')

 

Gradient Boosting Classifier

------------------------------------------------------------

Accuracy 84.27 %

from sklearn.metrics import precision_score, recall_score, confusion_matrix

# Calculate precision and recall

precision = precision_score(y_test, y_pred_gbc)

recall = recall_score(y_test, y_pred_gbc)

 

# Print the results

print(f'Precision: {precision:.2f}')

print(f'Recall: {recall:.2f}')

print("--"*30)

# Calculate confusion matrix

confusion = confusion_matrix(y_test, y_pred_gbc)

sns.heatmap(confusion, annot=True, fmt="d")

 

sns.countplot(x="Survived", data=full_data, palette="Blues"); plt.show()

model_scores = {
  • "Logistic Regression" has an accuracy score of 82.02%.
  • "Decision Tree Classifier" has an accuracy score of 80.34%.
  • "Random Forest Classifier" has an accuracy score of 82.58%.
  • "Gradient Boosting Classifier" has an accuracy score of 84.27%.
 The code you provided is using scikit-learn (sklearn) to calculate precision, recall, and the confusion matrix for a binary classification problem. Here's a breakdown of what each part of the code does:

1.     Import necessary libraries:

from sklearn.metrics import precision_score, recall_score, confusion matrix

This line imports the specific metrics and functions needed from scikit-learn.

2.     Calculate precision and recall:

              Precisioprecision = precision_score (y_test, y_pred_log_reg)

recall = recall_score(y_test, y_pred_log_reg)

The precision_score and recall_score functions are used to compute the precision and recall scores, respectively. These metrics are commonly used in binary classification tasks to evaluate the performance of the model.

 

3.     Print the results:

print(f'Precision: {precision:.2f}')

print(f'Recall: {recall:.2f}')

            This code prints the calculated precision and recall scores with two decimal places.

4.     Calculate the confusion matrix:

confusion = confusion_matrix(y_test, y_pred_log_reg)

5.     Visualize the confusion matrix:

sns.heatmap(confusion, annot=True, fmt="d")

This code uses a heatmap visualization to display the confusion matrix. The annot=True argument adds the actual counts to the heatmap cells, and fmt="d" specifies that the counts should be displayed as integers.

 

Precision: 0.85

Recall: 0.66

If we want that our prediction value should have low false positive, we would focus on Precision.

If we want that our prediction value should have low false negative, we would focus on Recall.

 

 

The code you provided demonstrates the use of scikit-learn's DecisionTreeClassifier for creating and training a decision tree classifier. Here's a breakdown of what the code does:

1.     Import the necessary library:

from sklearn.tree import DecisionTreeClassifier

This line imports the DecisionTreeClassifier class from scikit-learn, which is used to create a decision tree classifier model.

2.     Print a message indicating that a Decision Tree Classifier is being used:

print('Decision Tree Classifier')

This line simply prints a message to indicate that a Decision Tree Classifier is being used. It's a helpful way to provide information about the model being used in the code.

3.     Create an instance of the DecisionTreeClassifier:

Dtree = DecisionTreeClassifier()

This line creates an instance of the DecisionTreeClassifier class, which will be used as the decision tree model. The variable Dtree is used to reference this instance.

4.     Train the decision tree model:

Dtree.fit(x_train, y_train)

Decision Tree Classifier

 


1.   Import the necessary library:

from sklearn.metrics import accuracy_score

This line imports the accuracy_score function from scikit-learn, which is used to calculate the accuracy of classification models.


y_pred_Dtree = Dtree.predict(x_test)

This line uses the trained Dtree (Decision Tree Classifier) model to make predictions on the test dataset x_test, resulting in the predicted labels stored in y_pred_Dtree.

3.   Print a message indicating that the Decision Tree Classifier accuracy is being evaluated:

print('Decision Tree Classifier')

This line simply prints a message to indicate that the accuracy of the Decision Tree Classifier is being evaluated.

4.     Calculate and print the accuracy:

Dtree_accuracy = round(accuracy_score(y_test, y_pred_Dtree) * 100, 2)

print('Accuracy', Dtree_accuracy, '%')

The code calculates the accuracy of the Decision Tree Classifier by comparing the predicted labels (y_pred_Dtree) with the true labels from the test dataset (y_test). The accuracy_score function computes the accuracy as the fraction of correctly predicted instances out of the total number of instances in the test dataset. The result is rounded to two decimal places and printed as a percentage.

1.     Import the necessary libraries:

from sklearn.metrics import precision_score, recall_score, confusion_matrix

This line imports the specific metrics and functions needed from scikit-learn.

 

2.     Calculate precision and recall:

precision = precision_score(y_test, y_pred_Dtree)

recall = recall_score(y_test, y_pred_Dtree)

The precision_score and recall_score functions are used to compute the precision and recall scores based on the true labels (y_test) and the predicted labels from the Decision Tree Classifier (y_pred_Dtree).

 

3. Calculate accuracy:

Dtree_accuracy = round(accuracy_score(y_test, y_pred_Dtree) * 100, 2)

This line calculates the accuracy of the Decision Tree Classifier, similar to the previous code

 

4.     Print the results:

print(f'Precision: {precision:.2f}')

print(f'Recall: {recall:.2f}')

These lines print the calculated precision and recall scores with two decimal places.


confusion = confusion_matrix(y_test, y_pred_Dtree)

The confusion_matrix function computes a confusion matrix for the predicted labels (y_pred_Dtree) compared to the true labels (y_test).

 

6.     Visualize the confusion matrix using a heatmap:

sns.heatmap(confusion, annot=True, fmt="d")

This code uses a heatmap visualization to display the confusion matrix. The annot=True argument adds the actual counts to the heatmap cells, and fmt="d" specifies that the counts should be displayed as integers.

 

As given above, accuracy is 79.21% . Recall is greater than precision and negative class values (21,16) are also higher due to which value of accuracy is falling.


1.     Import the necessary library:

from sklearn.ensemble import RandomForestClassifier

This line imports the RandomForestClassifier class from scikit-learn, which is used to create a random forest classifier model.

2.     Print a message indicating that a Random Forest Classifier is being used:

print('Random Forest Classifier')

This line simply prints a message to indicate that a Random Forest Classifier is being used. It's a helpful way to provide information about the model being used in the code.


rfc = RandomForestClassifier()

This line creates an instance of the RandomForestClassifier class, which will be used as the random forest model. The variable rfc is used to reference this instance.

4.. Train the random forest model:

rfc.fit(x_train, y_train)

This line fits (trains) the random forest classifier model using the training data (x_train as the input features and y_train as the corresponding target labels). The fit method is used to train the model on the provided training data.

Random Forest

The code you provided calculates the accuracy of the Random Forest Classifier on a test dataset and prints the result. Here's a breakdown of what the code does:

Import the necessary library:

from sklearn.metrics import accuracy_score

This line imports the accuracy_score function from scikit-learn, which is used to calculate the accuracy of classification models.

1.     Make predictions using the trained Random Forest Classifier:

y_pred_rfc = rfc.predict(x_test)

This line uses the trained rfc (Random Forest Classifier) model to make predictions on the test dataset x_test, resulting in the predicted labels stored in y_pred_rfc.

2.   Print a message indicating that the Random Forest Classifier accuracy is being evaluated:

print('Random Forest Classifier')

This line simply prints a message to indicate that the accuracy of the Random Forest Classifier is being evaluated.

4.Calculate and print the accuracy:

rfc_accuracy = round(accuracy_score(y_test, y_pred_rfc) * 100, 2)

print('Accuracy', rfc_accuracy, '%')

The code calculates the accuracy of the Random Forest Classifier by comparing the predicted labels (y_pred_rfc) with the true labels from the test dataset (y_test). The accuracy_score function computes the accuracy as the fraction of correctly predicted instances out of the total number of instances in the test dataset. The result is rounded to two decimal places and printed as a percentage.

 


The code calculates the precision, recall, and the confusion matrix for the predictions made by the Random Forest Classifier on a test dataset and then visualizes the confusion matrix using a heatmap. Here's a breakdown of what the code does:

1.     Import the necessary libraries:

from sklearn.metrics import precision_score, recall_score, confusion_matrix

This line imports the specific metrics and functions needed from scikit-learn.

 

2.     Calculate precision and recall:

precision = precision_score(y_test, y_pred_rfc)

recall = recall_score(y_test, y_pred_rfc)

The precision_score and recall_score functions are used to compute the precision and recall scores based on the true labels (y_test) and the predicted labels from the Random Forest Classifier (y_pred_rfc).

3.     Print the results:

print(f'Precision: {precision:.2f}')

print(f'Recall: {recall:.2f}')

These lines print the calculated precision and recall scores with two decimal places.

4.     Calculate the confusion matrix:

confusion = confusion_matrix(y_test, y_pred_rfc)

The confusion_matrix function computes a confusion matrix for the predicted labels (y_pred_rfc) compared to the true labels (y_test).

5. Visualize the confusion matrix using a heatmap:

sns.heatmap(confusion, annot=True, fmt="d")

This code uses a heatmap visualization to display the confusion matrix. The annot=True argument adds the actual counts to the heatmap cells, and fmt="d" specifies that the counts should be displayed as integers.

 

The recall and precision are very close to each other. The negative prediction values are low in this case which shows that the accuracy is going to increase.

The code you provided demonstrates the use of scikit-learn's GradientBoostingClassifier for creating and training a gradient boosting classifier. Here's a breakdown of what the code does:

1.     Import the necessary library:

from sklearn.ensemble import GradientBoostingClassifier

This line imports the GradientBoostingClassifier class from scikit-learn, which is used to create a gradient boosting classifier model.

2.     Print a message indicating that a Gradient Boosting Classifier is being used:

print('Gradient Boosting Classifier')

This line simply prints a message to indicate that a Gradient Boosting Classifier is being used. It's a helpful way to provide information about the model being used in the code.

3.     Create an instance of the GradientBoostingClassifier:

gbc = GradientBoostingClassifier()

This line creates an instance of the GradientBoostingClassifier class, which will be used as the gradient boosting model. The variable gbc is used to reference this instance.

4.     Train the gradient boosting model:

gbc.fit(x_train, y_train)

This line fits (trains) the gradient boosting classifier model using the training data (x_train as the input features and y_train as the corresponding target labels). The fit method is used to train the model on the provided training data.

Gradient Boosting

The code you provided calculates the accuracy of the Gradient Boosting Classifier on a test dataset and prints the result. Here's a breakdown of what the code does:

1.     Import the necessary library:

from sklearn.metrics import accuracy_score

This line imports the accuracy_score function from scikit-learn, which is used to calculate the accuracy of classification models.

2.     Make predictions using the trained Gradient Boosting Classifier:

y_pred_gbc = gbc.predict(x_test

This line uses the trained gbc (Gradient Boosting Classifier) model to make predictions on the test dataset x_test, resulting in the predicted labels stored in y_pred_gbc.

3.   Print a message indicating that the Gradient Boosting Classifier accuracy is being evaluated:

print('Gradient Boosting Classifier')

This line simply prints a message to indicate that the accuracy of the Gradient Boosting Classifier is being evaluated.

4.     Calculate and print the accuracy:

gbc_accuracy = round(accuracy_score(y_test, y_pred_gbc) * 100, 2)

print('Accuracy', gbc_accuracy, '%')

The code calculates the accuracy of the Gradient Boosting Classifier by comparing the predicted labels (y_pred_gbc) with the true labels from the test dataset (y_test). The accuracy_score function computes the accuracy as the fraction of correctly predicted instances out of the total number of instances in the test dataset. The result is rounded to two decimal places and printed as a percentage.

 

The code calculates the precision, recall, and the confusion matrix for the predictions made by the Gradient Boosting Classifier on a test dataset and then visualizes the confusion matrix using a heatmap. Here's a breakdown of what the code does:

1.     Import the necessary libraries:

from sklearn.metrics import precision_score, recall_score, confusion_matrix

This line imports the specific metrics and functions needed from scikit-learn.

2.     Calculate precision and recall:

precision = precision_score(y_test, y_pred_gbc)

recall = recall_score(y_test, y_pred_gbc)

The precision_score and recall_score functions are used to compute the precision and recall scores based on the true labels (y_test) and the predicted labels from the Gradient Boosting Classifier (y_pred_gbc).

3.     Print the results:

print(f'Precision: {precision:.2f}')

print(f'Recall: {recall:.2f}')

These lines print the calculated precision and recall scores with two decimal places.

4.     Calculate the confusion matrix:

confusion = confusion_matrix(y_test, y_pred_gbc)

The confusion_matrix function computes a confusion matrix for the predicted labels (y_pred_gbc) compared to the true labels (y_test).

5.     Visualize the confusion matrix using a heatmap:

sns.heatmap(confusion, annot=True, fmt="d")

This code uses a heatmap visualization to display the confusion matrix. The annot=True argument adds the actual counts to the heatmap cells, and fmt="d" specifies that the counts should be displayed as integers.

 

In true prediction classes, our value are higher as compared to our negative prediction classes. Our negative prediction values here are lower than all prediction models drawn so far. It depends upon in which class (either positive or negative) we are interested and evaluate our confusion matrix accordingly.

 

The code you provided uses Seaborn to create a count plot of the "Survived" variable in the "full_data" dataset, with the "Blues" color palette. The count plot is a type of bar plot that shows the number of occurrences of each unique value in a categorical variable. Here's a breakdown of what the code does:

1.     Import the necessary libraries:

import seaborn as sns

import matplotlib.pyplot as plt

These lines import the seaborn library for data visualization and the matplotlib.pyplot library for displaying the plot.

                                                                        

2.     Create a count plot:

sns.countplot(x="Survived", data=full_data, palette="Blues")

This line creates a count plot where the "x" axis represents the "Survived" variable, and the data for the plot is taken from the "full_data" dataset. The "Blues" color palette is specified to provide a color scheme for the plot.

3.     Show the plot:

plt.show()

This line displays the created count plot.

# Sample model scores (replace these with your actual model scores)

model_scores = {

    "Logistic Regression": 82.02,

    "Decision Tree Classifier": 80.34,

    "Random Forest Classifier": 82.58,

    "Gradient Boosting Classifier": 84.27

It looks like you've created a dictionary called model_scores that stores the accuracy scores for different classification models. Each model is associated with its corresponding accuracy score. This is a helpful way to keep track of the performance of multiple models. The accuracy scores are represented as percentages.

Here's the breakdown of the dictionary:


    "Logistic Regression": 82.02,

    "Decision Tree Classifier": 80.34,

    "Random Forest Classifier": 82.58,

    "Gradient Boosting Classifier": 84.27

}

 

This dictionary can be useful for comparing the performance of different models at a glance. You can use this information to choose the best-performing model for your specific task or to make further improvements in your model selection and tuning.

 


Comments

Popular posts from this blog

Topic-18 | Evaluation Metrics for different model

Topic 22 | Neural Networks |

Topic 17 | Linear regression using Sklearn