·    The mnist module provides the MNIST dataset, which is a collection of 60,000 28x28 grayscale images of handwritten digits, along with a test set of 10,000 images.

MNIST dataset is very good dataset for starters, upon which we can practice building neural networks. There are lot of other network available which can also be used.

This type of dataset is used to train neural networks which are used to recognize the digits on the bank cheques.

·        The Sequential module is used to create a linear stack of layers, which is the most common way to build a neural network in Keras.

·        The Dense module creates a fully-connected layer, which is a layer where each neuron is connected to every neuron in the previous layer.

·        The to_categorical module converts a label vector to a one-hot encoded vector, which is required by the Dense layer.

As the name suggests,  before classification we have to convert the data into a special form of Yes and No. However the scenario here is different, instead of two classes, we have 10 classes here.

# splitting the data into test and train set

(X_train, Y_train), (X_test, Y_test) = mnist.load_data()


Let us start our work from the basic dataset

·       The line (X_train, Y_train), (X_test, Y_test) = mnist.load_data() splits the MNIST dataset into a training set and a test set.

·       The mnist.load_data() function downloads the MNIST dataset if it is not already in your computer. It then splits the dataset into a training set and a test set, with 60,000 images in the training set and 10,000 images in the test set.

·       The X_train and X_test variables contain the images of the digits, and the Y_train and Y_test variables contain the labels of the digits. The images are stored as 28x28 NumPy arrays, and the labels are stored as integers from 0 to 9.

·       We have 10 classes from which we are going to identify the different digits, whether it is 2, 8 9 etc

·       The mnist.load_data() function has two optional arguments:

  • shuffle: If set to True, the images and labels will be shuffled before splitting. This is useful to prevent the model from learning the order of the images.
  • seed: A random seed that can be used to control the shuffling. This is useful for reproducibility.

In this case, we are not setting any of the optional arguments, so the images and labels will be shuffled by default.

import matplotlib.pyplot as plt

# Number of digits to display

n = 10

# Create a figure to display the images

plt.figure(figsize=(20, 4))

# Loop through the first 'n' images

for i in range(n):

    # Create a subplot within the figure

    ax = plt.subplot(2, n, i + 1)

    # Display the original image

    plt.imshow(X_test[i].reshape(28, 28))

    # Set colormap to grayscale


    # Hide x-axis and y-axis labels and ticks



# Show the figure with the images

# Close the figure


  • Line 1: Imports the matplotlib.pyplot module.
  • Line 2: Defines the variable n to store the number of digits to display.
  • Line 3: Creates a figure to display the images.
  • Line 4: Starts a loop to iterate over the first n images in the X_test array.
  • Line 5: Creates a subplot within the figure for the current image.
  • Line 6: Displays the original image in the subplot.
  • Line 7: Sets the colormap to grayscale.
  • Line 8: Hides the x-axis and y-axis labels and ticks.
  • Line 9: Ends the loop.
  • Line 10: Shows the figure with the images.
  • Line 11: Closes the figure.

It was told in the last lecture that image has a size. The image given above is of 28x 28 pixe size. If we multiply the height and width of pixels, we get the total number of pixels in the picture. It is not possible that you take the whole image and hand over it to neuron as it can’t be passed to neuron in this way. So, we adopt a strategy here, what we do? We place the pixels of all image in a straight one-dimensional row.

Look at the diagram above:

We have image of 1 here. Each box in the 2 D Matrix (having rows and columns) is representing 1 x pixel. Now we are converting 2-Dimensional image to 1 dimensional one column/row/ array. The Process is called flattening the image.

The 2D Matrix comprises 4 rows and 4 columns which totals 16 blocks, so our 1D matrix should also comprise 16 boxes as well. Similarly the image which comprises 28 x 28 pixels should have 784 pixels in 1-D array

Before the reshaping, we print the shapes of the original training data and labels using the code:

print("Previous X_train shape: {} \nPrevious Y_train shape:{}".format(X_train.shape, Y_train.shape))

In our training data above, we have 60000 images having size of 28 x 28 pixels. While in target variable data that is y-train, it comprises only 60,000 images which shows that it comprises one dimensional images only.

The code provided appears to be part of a process where you're reshaping the training and testing data to a flat format. The images are being transformed from a 2D shape (usually representing height, width, and color channels) to a 1D shape (flattened array of pixel values). This is a common preprocessing step before feeding data into machine learning models like neural networks.

We are reshaping the training and testing data using the code:

X_train = X_train.reshape(60000, 784)

X_test = X_test.reshape(10000, 784)

Here, you're reshaping the X_train and X_test arrays so that each image is represented as a flattened array of length 784 (28x28 pixels). The number of rows in these reshaped arrays should match the number of samples you have in the dataset.

After this reshaping, the data is ready to be fed into a machine learning model that accepts flattened input features. Make sure to adjust your model architecture accordingly to handle the flattened input shape.

Now we have flattened the data using reshape function. After it the flattened data would be fed into the first layer of the neural network. The number of neurons in the first row of the neural network must be equal to the number of values being fed into it. For example, the value or pixels being fed to it are 784 so the number of neurons in the first row must also be 784. After flattening the image and feed it into the first layer of neural network, there is another challenge that is identification of images.

Min-Max Scalling

In our data set we have different digits and each digit has its own shape. We cant define the curves of the digit as per our own requirement. Everyone has its own choice to writhe each digit. In order to meet this challenge, we must have next level knowledge of boundaries of digits in our data set. The boundaries must be clearer. The boundaries in our image range between 0 to 255. 0 represents black and 255 represents white.

As our some part of our image would be pure black some part would be pure white and some part of the image would be between black and white, that we can say represents in grey shade. In order to make these grey shades in the image clear we use scaling technique between 0 and 1. This will convert the black dominant part of grey into pure black and white dominant part of grey into pure white.

In the given image all 0 presenting the white part of the image but the pixels that are close to 255 are in black shade and the pixels at the edges are close to white are in midtone of grey.

We want that the edges of the images at the corners became clear. We do scaling for this purpose.

Scaling will help in our image identification at the later stage.

Before scaling we convert it into float 32 (decimals) format to improve our performance.

# Convert the data type of the images to float32

X_train = X_train.astype('float32')

X_test = X_test.astype('float32')

In the provided code, we are converting the data type of the image data in X_train and X_test arrays to float32. This type conversion is a common preprocessing step in machine learning to ensure that the data is in the appropriate format for numerical calculations.

The astype function is used to change the data type of the arrays. By converting the data type to float32, you're ensuring that the pixel values of the images are represented as floating-point numbers, which is the standard format for numerical computations in many machine learning frameworks.

This type conversion is important because some machine learning algorithms and neural network models require input data to be in a specific data type for accurate computations and training. float32 is a commonly used data type for this purpose as it provides a good balance between precision and memory usage.


# Normalize the pixel values to a range between 0 and 1  # Zero is for Black  #1 for White

X_train /= 255

X_test /= 255


In the provided code, you are normalizing the pixel values of the image data in the X_train and X_test arrays to a range between 0 and 1. This normalization step is another important preprocessing technique often used when working with image data in machine learning.

The idea behind this normalization is to scale the pixel values so that they lie within the range [0, 1]. The original pixel values usually span from 0 (black) to 255 (white) in grayscale images. By dividing each pixel value by 255, you effectively rescale the values to be in the range [0, 1], which is more suitable for many machine learning algorithms and neural networks.

Normalizing the data helps in preventing issues related to varying scales of input features. It can also improve the convergence speed and stability of training processes, especially when using optimization algorithms like gradient descent.

Keep in mind that normalization is a crucial step, especially when dealing with neural networks, as it can have a significant impact on the model's performance and training dynamics.

Processing the Target variable

As we know that our variables are 10 and in our target variable there can be any digit between 0 and 10.  Now we have to convert our variables in the form of 0 and 1 to make it understandable to the model. Now challenge is that we have now 10 number of classes instead of 2 number of classes that were easy to classify into 0 and 1 format. In order to overcome this challenge, first of all have a glance of this pictorial representation.

Suppose we have a image that is 0 digit image. The first column of the row against it will define it as vector 1 and the remaining values would be 0. So in our target variable there would have 10 vectors for every image one value out of which would be 1 . This location of the value would tell about the identity of the digit.

For example it the location of 1 is under columns 2, it means the image would be 2

In order to have this type of classification we have imported the categorical module above.

# Number of classes in the dataset

classes = 10

# Convert the labels to one-hot encoded format

Y_train = to_categorical(Y_train, classes)

Y_test = to_categorical(Y_test, classes)


# Print the shapes of the preprocessed training data and labels

print("New X_train shape: {} \nNew Y_train shape:{}".format(X_train.shape, Y_train.shape))

We are  converting the labels of the dataset into one-hot encoded format. One-hot encoding is a common technique used to represent categorical variables (such as class labels) in a format that is suitable for machine learning algorithms.

In this code:

classes = 10


We 've defined the variable classes to store the number of classes in our dataset (which is 10).

Y_train = to_categorical(Y_train, classes)

Y_test = to_categorical(Y_test, classes)

We simply tell the module that this is our target variable say Y-train and this is our classes variable. This will convert it inot 0 and 1 form. This process of converting is called on-hot coding. In which 1 is representing the desired digit.

We 've used the function to_categorical to convert the class labels in both Y_train and Y_test arrays into one-hot encoded format. One-hot encoding represents each class label as a binary vector where a '1' appears at the index corresponding to the class and '0' in other positions. This is often used when dealing with multi-class classification problems.

print("New X_train shape: {} \nNew Y_train shape:{}".format(X_train.shape, Y_train.shape))

Finally, we are printing the shapes of the pre-processed training data and the one-hot encoded labels to verify the changes.


New X_train shape: (60000, 784)

New Y_train shape:(60000, 10)


New X_train shape: (60000, 784)

We have converted our train data into one dimensional array which have 60,000 images of size 784 pixels or classes.

New Y_train shape:(60000, 10)

It's important to note that the shape of our trarget variable Y_train that was only 60,000 previously after flattening;  will now be (num_samples, num_classes) where num_samples is the number of training samples, and num_classes is the number of classes (which is 10 in our case). The same applies to the shape of Y_test.

One-hot encoding is commonly used for classification tasks and is essential when training models like neural networks for multi-class classification, as it helps the model interpret class labels correctly during training.

This is the required shape of our data.

Setting up Hyper-parameters

As discussed earlier, hyper parameters are adjusted before training and without considering the dataset. We set their values based on our experience.

# Define the input size for each data sample (e.g., image pixels)

input_size = 784


# Specify the number of data samples to process in each batch

batch_size = 200


# Define the number of neurons in the first hidden layer

hidden1 = 400


# Define the number of neurons in the second hidden layer

hidden2 = 20


# Define the total number of classes/categories in the dataset

classes = 10


# Set the number of complete passes through the dataset during training

epochs = 5

We are defining various parameters and hyperparameters that will be used for training a neural network model. Here's a breakdown of each parameter:

1.     input_size: This parameter represents the number of features in each data sample. In our context, it corresponds to the number of pixels in a flattened image (28x28 = 784) since we flattened our images earlier.

2.     batch_size: This parameter determines the number of data samples that are processed in each iteration during training. It's a key factor in controlling memory usage and training efficiency. In our case, each training iteration will process 200 samples at a time. 01 x Epoch will be completed after iterating 60,000 images one time.

3.     hidden1: This parameter specifies the number of neurons in the first hidden layer of your neural network. The hidden layers are where the actual learning happens as the model extracts features and representations from the input data.

4.     hidden2: This parameter specifies the number of neurons in the second hidden layer. The architecture you're defining has two hidden layers: one with 400 neurons and the second with 20 neurons.

5.     classes: This parameter indicates the total number of classes or categories in your dataset. In your case, you're working with a dataset containing 10 classes.

6.     epochs: This parameter defines the number of complete passes through the entire training dataset during training. Each epoch consists of multiple iterations (mini-batches) where the model is updated based on the training data. In your case, you're training the model for 5 epochs.

These parameters are critical for configuring our neural network architecture and controlling the training process. The specific values you've chosen will affect the behaviour of your neural network and its performance on our task.


The practice we so far have made , does not contribute anything towards building the architecture of neural network. We so far have observed data, done preprocessing of data and have set the hyper parameters.

Now we will define architecture of neural network.

Building the FCN Model

Before we proceed further, be aware that there are many types of neural networks. The neural network we are discussing here is one the fundamentals of neural network, called standard or fully connected neural network as it was built first of all. The architecture we are displaying here was built in year 1988.

# Create a Sequential model, which allows us to build a neural network layer by layer

model = Sequential()


# Add the first hidden layer with 'hidden1' neurons, using ReLU activation function

# The 'input_dim' specifies the input size for this layer

model.add(Dense(hidden1, input_dim=input_size, activation='relu'))

# output = relu(dot(W, input) + bias)


# Add the second hidden layer with 'hidden2' neurons, also using ReLU activation function

model.add(Dense(hidden2, activation='relu'))


# Add the output layer with 'classes' neurons, using softmax activation function

# Softmax activation ensures that the output values represent probabilities of each class

model.add(Dense(classes, activation='softmax'))


### Compilation ###


# Compile the model by specifying the loss function, optimizer, and evaluation metrics


              metrics=['accuracy'], optimizer='sgd')


# Display a summary of the model architecture, showing the layers and parameter counts



We are building and compiling a neural network model using the Keras API. Here's a breakdown of each step:

In this part of the code:

model = Sequential()


We are creating a Sequential model. This type of model allows you to build a neural network layer by layer, sequentially adding one layer after another.

We can understand the sequential from the daily life example. Suppose Sequential is the box in which we are stacking different layers in sequence. First of all input layer is stacked then first hidden layer then second hidden layer and out put layer at the last.

We see that these layers don’t have any connection between them. It is the specialty of this sequential box whenever it is required to establish the connection, it will built the connection automatically by using the single command of tensor flow.

So in the sequential box we just stack the all layers and on  running compiler command of tensor flow connection between all layers is established.

We have named the sequential as model as after running the compiler a model would have been built and would be ready to perform the rest of tasks. We will fit, train and evaluate that model and make prediction on the basis of this.

We can say that tensor flow gives us a sequential box, in which we stack neural network layers and on running the compiler this sequential box is converted to the neural network.

model.add(Dense(hidden1, input_dim=input_size, activation='relu'))

# output = relu(dot(W, input) + bias)

As discussed above that there are many types of neural networks. The neural network we are discussing is fully connected neural network. So when we add the layer in our model that is fully connected is called Dense layer.

So the Dense layer represents a fully connected layer where each neuron is connected to every neuron in the previous layer.

The container Sequential is the big box, which contains another box in which layers are stacked.  Now we have to place the neurons in layer box. The type of nurons to be placed depend upon the type of layers. If the layers are hidden then Rectified Linear Unit (ReLU) neurons would be placed. If the layer is output layer, then we will first check whether the problem is of regression or classification and neurons then would be placed accordingly.

We are adding the first hidden layer using the Dense layer. hidden1 specifies the number of neurons in this layera that are 400, and the input_dim parameter is set to input_size, which is the number of features in each input sample (784 in our case). The activation function used for this layer is the Rectified Linear Unit (ReLU) activation.


So until now we have placed 400 neurons on first dense layer in our sequential container.

model.add(Dense(hidden2, activation='relu'))


We are adding the second hidden layer using another Dense layer. This layer contains hidden2 neurons and also uses the ReLU activation function.


model.add(Dense(classes, activation='softmax'))


We are adding the output layer using a final Dense layer. This layer has classes neurons, which corresponds to the number of classes in your dataset. The activation function used here is the softmax activation function, which converts the output values into probabilities, ensuring that they sum up to 1.


It must be remembered that softmax is the group of neurons while ReLU and Sigmoid are single neurons.

So until now we have placed 400 neurons on first dense layer and 20 neurons on second dense layer and softmax neurons as per number of classes on output layer in our sequential container. We so far have not connected these layers.



              metrics=['accuracy'], optimizer='sgd')


After building the model architecture, we proceed to the compilation step. At this stage the model is given with the two parameters. First is loss Function and other is metrics function.



First parameter would help to determine whether neural network is working accurately or not. Loss function is used for this purpose. This is the loss function used for training the model. It's appropriate for multi-class classification tasks. This model measures the distance between actual and prediction. If the distance between them is more the more the loss would be and vice versa.



This specifies the evaluation metric you want to track during training. Here, you're using accuracy.



Now in order to minimize the loss we will use gradient descent that is sgd. sgd stands for Stochastic Gradient Descent. Remember that whenever the optimizer is used, there would always presence of gradient descent, although with different or added features.

This specifies the optimization algorithm to use during training.


Finally, you display a summary of the model's architecture and parameter counts using model.summary().Your model is now defined, compiled, and ready for training!

Following summary would be generated after running the above code:

·       There are 400 neurons in first hidden layer.

·       There are 20 neurons in second hidden layer.

·       There are 10 neurons in output layer.

·       There are 322,230 parameters. It means 322,230 parameters/numbers must be updated after which the model will learn. At this stage after running the desired number of epochs the value of loss would have reduced to minimum.

As it was told earlier that the number of neurons in our first layer must be equal to the values in the 1 Dimensional flattend. That were 784. So, we have defined it in our code as follows:

model.add(Dense(hidden1, input_dim=input_size, activation='relu'))

# output = relu(dot(W, input) + bias)

 It means input layer is contain 784 neurons, first hidden layer contains 400 neurons, second hidden layer contains 20 neurons and output layer contains 10 neurons.


Parameters Calculations

Remember these are Parameters not hyper parameters. Parameters are continuously updated by gradient descent and through learning by observing the data.


output = relu(dot(W, input) + bias)

(400*784) + 400 = 314000 = 0.3 million paramters

·       784 neurons were fed into our input layer.

·       400 neurons were fed into our first hidden layer.

·       As in connected/ Dense neural network neurons in each layer are connected with the neurons in previous layer, so we will connect the each of 400 neurons in hidden layers with each of 784 neurons of the input layer. The total number of neurons connected can be found by multiplying (400*784).

·       400 more biased neurons are added with the previous neurons. This make it easy for learning the model.

parameters for Chat-gpt 4 = 1760000000000 = 1.76 trillion parameters

Training The Model

# Import necessary libraries

from time import time


# Record the current time to measure training time

tic = time()


# Fit the model on the training data, Y_train, batch_size=batch_size, epochs=epochs, verbose=1)


# Record the time after model training

toc = time()


# Calculate and print the time taken for model training

print("Model training took {} secs".format(toc - tic))


# Testing the trained model

### 5. Test

# You can continue your code from here...

In the provided code, we are fitting the compiled model to the training data, measuring the time it takes for training, and preparing to test the trained model on the testing data.

In this part of the code:

1.     You import the time function from the time module to measure the training time.

2.     You record the current time using tic before starting the training of the model.

3.     You use the fit method to train the model on the training data (X_train and Y_train). The parameters used in fit are batch_size (200), epochs (5), and verbose (1). The verbose parameter controls the verbosity level during training. A value of 1 means progress updates will be printed for each epoch.

4.     You record the time after the model training is complete using toc.

5.     You calculate and print the time taken for model training by subtracting tic from toc.

The remaining part of your code indicates that you're about to test the trained model on the testing data. You can continue your code from this point to evaluate the model's performance on the testing dataset and make predictions using the trained model. If you have specific questions or tasks related to this testing phase, feel free to provide more details, and I'll be happy to assist!

Testing The Model

# Import the necessary libraries

from sklearn.metrics import accuracy_score

import numpy as np

import matplotlib.pyplot as plt


# Predict probabilities for the test set using the trained model

y_pred_probs = model.predict(X_test, verbose=0)

y_pred = np.where(y_pred_probs > 0.5, 1, 0)


# Calculate and print the test accuracy using predicted and true labels

test_accuracy = accuracy_score(y_pred, Y_test)

print("\nTest accuracy: {}".format(test_accuracy))

Test accuracy: 0.9089

# Import the necessary libraries

from sklearn.metrics import accuracy_score

import numpy as np

import matplotlib.pyplot as plt


# Predict probabilities for the test set using the trained model

y_pred_probs = model.predict(X_test, verbose=0)

y_pred = np.where(y_pred_probs > 0.5, 1, 0)

In this part of the code:

1.     You import the accuracy_score function from sklearn.metrics to compute the accuracy of your model's predictions.

2.     You import numpy as np for array manipulation.

3.     You import matplotlib.pyplot as plt for potential visualization (although it seems you haven't used it in this code snippet).

4.     You use the trained model (model) to predict probabilities for the test set (X_test) using the predict method. The verbose parameter is set to 0 to suppress progress updates.

5.     You threshold the predicted probabilities to convert them into binary predictions. Here, you're using a threshold of 0.5. If a predicted probability is greater than 0.5, it's considered as class 1; otherwise, it's considered as class 0.


# Calculate and print the test accuracy using predicted and true labels

test_accuracy = accuracy_score(y_pred, Y_test)

print("\nTest accuracy: {}".format(test_accuracy))

In this part of the code:

1.     You calculate the test accuracy by comparing the predicted labels (y_pred) with the true labels (Y_test) using the accuracy_score function.

2.     You print the test accuracy.

This code snippet essentially evaluates the performance of your trained model on the testing dataset by calculating the test accuracy. The accuracy score is a common metric used to assess how well a classification model performs on unseen data. It represents the ratio of correctly predicted instances to the total number of instances in the test set.

# Define a mask for selecting a range of indices (20 to 49)

mask = range(20, 50)


# Select the first 20 samples from the test set for visualization

X_valid = X_test[0:20]

actual_labels = Y_test[0:20]


In this part of the code:

1.     You define a mask variable using the range function. This mask is a sequence of indices from 20 to 49. This could be used to select a specific subset of data points from the test set.

2.     You select the first 20 samples from the test set (X_test) and assign them to the variable X_valid. These samples will be used for visualization and prediction.

3.     You also select the corresponding true labels for the selected samples from the test set and store them in the actual_labels variable.


# Predict probabilities for the selected validation samples

y_pred_probs_valid = model.predict(X_valid)

y_pred_valid = np.where(y_pred_probs_valid > 0.5, 1, 0)


1/1 [==============================] - 0s 88ms/step

In this part of the code:

1.     You use the trained model (model) to predict probabilities for the selected validation samples (X_valid) using the predict method.

2.     You threshold the predicted probabilities to obtain binary predictions for the selected validation samples. Similar to before, you're using a threshold of 0.5. If a predicted probability is greater than 0.5, it's considered as class 1; otherwise, it's considered as class 0.

At this point, you have predicted labels for a subset of the test data, and you can proceed to visualize the results or perform any further analysis you have in mind. If you have more code related to visualization or analysis, feel free to share it, and I can assist you with that as well.


# Set up a figure to display images

n = len(X_valid)

plt.figure(figsize=(20, 4))


for i in range(n):

    # Display the original image

    ax = plt.subplot(2, n, i + 1)

    plt.imshow(X_valid[i].reshape(28, 28))





    # Display the predicted digit

    predicted_digit = np.argmax(y_pred_probs_valid[i])

    ax = plt.subplot(2, n, i + 1 + n)

    plt.text(0.5, 0.5, str(predicted_digit), fontsize=12, ha='center', va='center')



# Show the plotted images


# Close the plot



In this part of the code, we are setting up a figure to display original images from the validation subset and their corresponding predicted digits.

In this part of the code:

  1. You set the variable n to the number of samples in your validation subset.
  2. You use plt.figure(figsize=(20, 4)) to set up a figure for plotting the images. This specifies the size of the figure.
  3. You iterate through each sample in the validation subset (X_valid), displaying the original image and the corresponding predicted digit.
    • For the original image:
    • You use plt.subplot(2, n, i + 1) to create a subplot for the original image.
    • You display the image using plt.imshow after reshaping it to a 28x28 format.
    • You set the color map to grayscale using plt.gray() to display the image in grayscale.
    • You use ax.get_xaxis().set_visible(False) and ax.get_yaxis().set_visible(False) to hide the axes.
    • For the predicted digit:
    • You calculate the predicted digit using np.argmax(y_pred_probs_valid[i]), which gets the index of the maximum predicted probability.
    • You create a subplot for the predicted digit using plt.subplot(2, n, i + 1 + n).
    • You use plt.text to display the predicted digit as text in the center of the subplot. The ha and va parameters are set to 'center' for horizontal and vertical alignment, respectively.
    • You use plt.axis('off') to turn off the axes.
  1. After plotting all images, you use to display the plotted images.
  2. Finally, you use plt.close() to close the plot.

This code snippet creates a visualization of the original images along with the predicted digits for each image. It's a useful way to visually inspect how well your model is performing on a subset of the validation data.

Suppose input image of 9 is given to the model to predict. The softmax has generated the result as 9.


