AI Free Basic Course | Lecture 25 | Building & Training CNN

AI Free Basic Course | Lecture 25 | Building & Training CNN | Live Session

Yesterday we discussed about the convolutional Neural Network (CNN). We could not understand, how the features map is formed when filter is applied on an image.

We studied that convolutional Neural Network (CNN) is better than Artificial Neural Network (ANN). Now we explain what features make Convolutional Neural Network better than Artificial Neural Network (ANN).

Look at the image of puzzle. We give the same puzzle to two different friends. One friend is given the puzzle blocks after mixing it haphazardly and ask him to recognize the image by watching each block of the puzzle. Other friend is given the same parts of the puzzle in arranged form and ask him to recognize the image.

The second friends start to watching the puzzle from upper left and glides eyes on the image in slices and recognize the image at the end. During the process of recognizing the image, he is watching the image in small pieces.

Second friend will recognize the image more easily than the first friend.

Artificial Neural Network works like the first friend in the example given above. We break the image in pixels before giving it to the Artificial Neural Network like blocks in the example.

Coevolutionary Neural Network works like the second friend and recognizes the image by applying filter on small part of it.

We take a picture and view it with the help of a magnifying glass piece by piece. The process of viewing the image piece by piece is called rendering. During the rendering process it is our choice to move in any direction whether it is diagonal, vertical or horizontal. The decision about the direction we make is called filtration. We will place value of 1 in the direction in which we are applying our filter.

During the process of rendering the value of the filter is added with the value of the rendering space and the added value is displayed as first digit on the feature map. Thus, feature map is formed in this way.

In order to understand this, we take an example. Suppose we have small pieces of tringle, circle and square. We stain these shapes with the Stainer which allows only passing of triangle shape from its holes. In the same filter works, it applied vertically it will select only vertical lines including edges and different shapes of the image.

As we discussed earlier the image of cat or dog is not actually image of cat or dog for the computer, image is a large matrix which contains number of numbers. If we want to identify a logo from the given picture, we can apply binary classification model to identify the logo. By making the convolutional model and applying the sigmoid neuron at the end the model will tell whether it is logo or not.

The artificial neural network converts the image pixels in one dimensional layer and take all these pixels as input. As the image size of amnist data comprising 28 x 28 pixels is converted into 784 one dimensional pixels. The problem with this network is that as the size of image increases the parameters or dimensions of the image would increase and would it difficult to process. In order to deal with this problem, we use Coevolutionary Neural Network which extracts features from image and the model learns with the less parameters. So, we have computationally less expensive model in the form of coevolutionary network.

We will see the difference of computational power used by Artificial Neural Network and Coevolutionary Neural Network during practicing the code in collab. We will see how the number of parameters is reduced in Coevolutionary Neural Network as compared to Artificial Neural Network.

Another example is that in Artificial Neural Network, we divide the image by cutting into strips vertically or horizontally during the process of converting it to 1 Dimensional layer. Process makes identification of image difficult for the computer although it detects the image ultimately.

On the other side in Coevolutionary Neural network inherently look the image more clearly as the 2-Dimensional status of the image remain in intact and the model detects the image easily. So convolutional Neural Network is inherently superior in computer vision.

In the image a kernel is moving over the image and a feature map as a result is formed. It means that convolution process is being performed when the kernel or filter is moving over the image frame by frame. During the process the kernel or filter is detecting the features of the image like whether these are curved or diagonal or straight lines and print the same on the feature map.

It works like you have a map larger in size, you extract your desired location from it and draw your map on the separate page and then pick the relevant information from it.

To make it more understandable we taken another example. Suppose you have a image in front of you as a computer. During the process of convolution, first of all you see it with the yellow glasses and select the parts of the image that look prominent due to wearing the yellow glasses. After that, you see it with the red glasses and select the parts of the image that look prominent due to wearing the red glasses and so on. So every filter chooses the different features from the image.

It means when we form the convolution layer in convolution neural network, every convolution neuron will act like yellow or red glasses and would detect different features from the image.

So each convolution neuron in the first layer will store separate features in it and after combining these features will develop another new image and will pass it to another layer.

The next layer will scan it with another type of glasses and this layer now would have more information and which will make it possible to detect such features from the image that could not be detected from the image possibly. The function of last layer is to decide whether it is desired information or not.

Above is an architecture of Convolution Neural Network. There is input image upon which the filter was applied and a feature map was formed as a result. Then we apply the pooling technique over the feature map. Pooling is the process of extracting the significant parts from the feature map. Suppose we are observing 2 x 2 image in a feature map and from these four values we pick the important value and pass it to the next layer.

What we actually are doing, we are picking the important information step by step to make processing or computation easy. Initially we have 16 x 16 image which is being gradually reduced. The information remains the same, but dimension reduce.

At the next layer, the pooling process will pick the significant features and left behind the nonsignificant features. At the end we will have an image having all desired information with less dimensions.

At last, we will take the fully connected neural network layer and pass the information to it. As the fully connected neural network was not able to extract the features so we passed the information after extracting the features with the help of convolution Neural Network and pooling.

We know that the number of neurons in fully connected neural network must be equal to number of classes and activation function of soft max would be applied over the layer of fully connected neural network.

Let us summarize it. In convolution Network there are four type of layers.

1. Input layer

2. Convolution layer which processes the filters.

3. Max Pooling layer which reduces the dimensions.

4. Fully Connected Layer which do the classification or regression.(Supervise Learning)

At the output layer:

· if the problem is related to the binary classification, we will apply single sigmoid and

· if the problem is related to multi class classification, then we will apply Soft Max group.

· If the problem is related to regression, then we will apply relu.

Every neuron has two parts. First part do the linear calculation by multiplying the weight with input and add bias. Second part of neuron is activation function which can be relu or sigmoid or softmax.

So in linear neuron, we do the linear calculation and give the out put without applying the activation function.

Let us go to the collab notebook:

Today we will connect our collab notebook in a different way.

1. We will click the Reconnect button and will go for change runtime.

Today we will apply CNN on amnist data to compare it with Fully Connected Network we applied on the same in the previous lecture.

import matplotlib.pyplot as plt

# Number of digits to display

n = 10

# Create a figure to display the images

plt.figure(figsize=(20, 4))

# Loop through the first 'n' images

for i in range(n):

# Create a subplot within the figure

ax = plt.subplot(2, n, i + 1)

# Display the original image

plt.imshow(X_test[i].reshape(28, 28))

# Set colormap to grayscale

plt.gray()

# Hide x-axis and y-axis labels and ticks

ax.get_xaxis().set_visible(False)

ax.get_yaxis().set_visible(False)

# Show the figure with the images

plt.show()

# Close the figure

plt.close()

Line 1: Imports the matplotlib.pyplot module.
Line 2: Defines the variable n to store the number of digits to display.
Line 3: Creates a figure to display the images.
Line 4: Starts a loop to iterate over the first n images in the X_test array.
Line 5: Creates a subplot within the figure for the current image.
Line 6: Displays the original image in the subplot.
Line 7: Sets the colormap to grayscale.
Line 8: Hides the x-axis and y-axis labels and ticks.
Line 9: Ends the loop.
Line 10: Shows the figure with the images.
Line 11: Closes the figure.

Now we are reshaping the data, in artificial neural network we boke the images in pixels but in Convolution Neural Network we gave the whole picture on which filleter is applied.

It reshapes the x_train and x_test tensors to have the shape (batch_size, height, width, channels), where channels is 1. This is the "channel last" format that is used by the TensorFlow backend.

The x_train and x_test tensors are typically images, and the channels dimension represents the number of color channels in the image. In this case, the images are grayscale, so there is only one channel. What if 1 is not given? It will assume the model in three colors as Red Green and Blue.

The reshape() function takes a tensor as input and returns a new tensor with the specified shape. The shape is a list of integers that specifies the number of elements in each dimension of the tensor. In this case, the first dimension of the shape is the batch size, which is the number of images in the dataset. The second and third dimensions are the height and width of the images, respectively. The fourth dimension is the number of channels, which is 1.

In the provided code, we are normalizing the pixel values of the image data in the X_train and X_test arrays to a range between 0 and 1. This normalization step is another important preprocessing technique often used when working with image data in machine learning.

The idea behind this normalization is to scale the pixel values so that they lie within the range [0, 1]. The original pixel values usually span from 0 (black) to 255 (white) in grayscale images. By dividing each pixel value by 255, you effectively rescale the values to be in the range [0, 1], which is more suitable for many machine learning algorithms and neural networks.

Normalizing the data helps in preventing issues related to varying scales of input features. It can also improve the convergence speed and stability of training processes, especially when using optimization algorithms like gradient descent.

Keep in mind that normalization is a crucial step, especially when dealing with neural networks, as it can have a significant impact on the model's performance and training dynamics.

Processing the Target variable

As we know that our variables are 10 and in our target variable there can be any digit between 0 and 10. Now we have to convert our variables in the form of 0 and 1 to make it understandable to the model. Now challenge is that we have now 10 number of classes instead of 2 number of classes that were easy to classify into 0 and 1 format. In order to overcome this challenge, first of all have a glance of this pictorial representation.

Suppose we have a image that is 0 digit image. The first column of the row against it will define it as vector 1 and the remaining values would be 0. So in our target variable there would have 10 vectors for every image one value out of which would be 1 . This location of the value would tell about the identity of the digit.

For example it the location of 1 is under columns 2, it means the image would be 2

In order to have this type of classification we have imported the categorical module below.

It uses the to_categorical () function from the keras.utils module to convert the labels from the y_train and y_test tensors into one-hot encoded vectors.

One-hot encoding is a technique used to represent categorical data in a way that is compatible with machine learning algorithms. In this case, the labels are categorical, because they represent the class of each image. One-hot encoding converts each label into a vector of binary values, where the index of the 1 value corresponds to the class of the image.

The to_categorical() function takes two arguments: the input tensor and the number of classes. In this case, the number of classes is 10, because there are 10 different classes of images in the dataset.

The output of the to_categorical() function is a tensor with the same shape as the input tensor, but with an additional dimension for the class labels. In this case, the output tensor will have the shape (batch_size, 10), where batch_size is the number of images in the dataset.

The print() statements in your code will print the shapes of the x_train, y_train, x_test, and y_test tensors. The shapes of the tensors will be updated to reflect the one-hot encoding of the labels.

The output of the print statements will display the shapes of your data arrays after the one-hot encoding has been applied:

x_train.shape should be (num_samples, height, width, channels) as you reshaped it earlier.
y_train.shape should be (num_samples, num_classes) after one-hot encoding.
x_test.shape should similarly have the shape (num_samples, height, width, channels) as x_train.
y_test.shape should also have the shape (num_samples, num_classes) after one-hot encoding.

· from keras.models import Sequential

· from keras.layers.core import Dense ,Flatten

· from keras.layers.convolutional import Conv2D, MaxPooling2D

We are importing the necessary modules from Keras to build a convolutional neural network (CNN) model. Here's a breakdown of the imports:

1. Sequential: This is the basic type of model in Keras, allowing you to build a linear stack of layers.

2. Dense: This is a fully connected (dense) layer, where every neuron in the layer is connected to every neuron in the previous and subsequent layers. You'll typically use this for the final classification layer.

3. Flatten: This layer is used to flatten the multi-dimensional data into a one-dimensional vector. It's often used to transition from convolutional and pooling layers to fully connected layers.

4. Conv2D: This is a 2D convolutional layer, which applies convolutional operations to the input data. You can specify the number of filters, kernel size, activation function, and more.

5. MaxPooling2D: This layer performs max pooling, which reduces the spatial dimensions of the data while retaining the most important features. It helps with feature extraction and dimensionality reduction.

With these modules imported, you can proceed to build your CNN architecture using Keras. You'll define a Sequential model, add convolutional, pooling, and dense layers, and then compile and train the model using your data.

# img_rows, img_cols, channels = 28, 28, 1 # 1 for greyscale images and 3 for rgb images

# classes=10

# Define the dimensions of the input image

img_rows, img_cols, channels = 28, 28, 1 # 1 for greyscale images and 3 for rgb images

# Define the number of filters for each layer of the CNN

filters = [6, 32, 80 ,120] # These are the number of filters in each layer of the CNN

# Define the number of classes for classification

classes = 10 # This is the number of different categories that the CNN will classify images into

We are setting up various parameters and configurations for a Convolutional Neural Network (CNN) architecture. Here's a breakdown of what each part of the code is doing:

1. Image Dimensions:

· img_rows and img_cols define the dimensions of the input images. These are set to 28x28, which is common for MNIST dataset-like images.

· channels specifies the number of color channels in the images. For grayscale images, this is 1, and for RGB images, this would be 3.

2. Number of Filters:

· filters is a list that defines the number of filters for each layer of the CNN. The first layer has 6 filters, the second has 32 filters, the third has 80 filters, and the fourth has 120 filters.

· Filters in CNNs are responsible for extracting different features from the input data. The number of filters determines the complexity and capacity of the network to learn various features.

3. Number of Classes:

· classes specifies the number of different categories (or classes) that the CNN will classify images into. In this case, it's set to 10, which could correspond to different digits in a digit recognition task, for example.

These parameter definitions are common when setting up a CNN architecture. The specific values you've chosen for img_rows, img_cols, channels, filters, and classes will affect the architecture and performance of your CNN model. You can use these parameters when constructing your CNN layers using a framework like TensorFlow/Keras.

# Creating Model

model=Sequential() #Sequential is a container to store layers

model.add(Conv2D(filters[0],(3,3),padding='same',\

activation='relu',input_shape=(img_rows,img_cols, channels)))

model.add(MaxPooling2D(pool_size=(2,2))) #For reducing image size

# (dim+pad-kernel)/2 (28 +3 -3)/2 = 14

model.add(Conv2D(filters[1],(2,2),padding='same', activation='relu'))

model.add(MaxPooling2D(pool_size=(2,2)))

# (dim+pad-kernel)/2 (14 +2 -2)/2 = 7

model.add(Conv2D(filters[2],(2,2),padding='same', activation='relu'))

model.add(MaxPooling2D(pool_size=(2,2)))

# (dim+pad-kernel)/2 (7 +2 -2)/2 = 3

model.add(Conv2D(filters[3],(2,2),padding='same', activation='relu'))

model.add(MaxPooling2D(pool_size=(2,2)))

# (dim+pad-kernel)/2 (3 +2 -2)/2 = 1

model.add(Flatten())

model.add(Dense(64,activation='relu'))

model.add(Dense(classes, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

We have successfully created a Convolutional Neural Network (CNN) model using Keras. Here's a breakdown of the architecture and the purpose of each layer:

1. Sequential Model:

· model=Sequential() initializes a sequential container to store layers sequentially.

2. Convolutional and MaxPooling Layers:

· Conv2D(filters[0], (3,3), padding='same', activation='relu', input_shape=(img_rows, img_cols, channels)) creates a convolutional layer with 6 filters, a kernel size of (3,3), 'same' padding, ReLU activation, and input shape defined by (img_rows, img_cols, channels).

· MaxPooling2D(pool_size=(2,2)) adds a max-pooling layer with a pool size of (2,2), reducing the image dimensions by half.

3. More Convolutional and MaxPooling Layers:

· Two more sets of convolutional and max-pooling layers are added similarly. These layers help extract features and reduce spatial dimensions further.

4. Flatten Layer:

· Flatten() flattens the multi-dimensional output from the previous layers into a one-dimensional vector.

5. Dense Layers:

· Dense(64, activation='relu') adds a fully connected layer with 64 neurons and ReLU activation.

· Dense(classes, activation='softmax') adds the final fully connected layer with neurons equal to the number of classes and a softmax activation function for classification probabilities.

6. Model Compilation:

· model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy']) compiles the model. The loss function is categorical cross-entropy (suitable for multi-class classification), the optimizer is stochastic gradient descent (SGD), and the metric for evaluation is accuracy.

Our CNN architecture looks well-defined and ready for training. You can use the model.fit() method to train it on your training data (x_train and y_train). Remember to preprocess your data appropriately before training.

Pooling Layer

Here is the summary of processes we have passed through:

Now look at the number of parameters in Fully Connected Neural Network and compare it with the number of parameters in Coevolutionary Neural Network. You will see the significant reduction.

It trains the CNN model on the x_train and y_train datasets and evaluates the model on the x_test and y_test datasets.

The fit() function takes four arguments: the training data, the labels, the validation split, the number of epochs, and the batch size. The validation split is the fraction of the training data that is used for validation. The number of epochs is the number of times the model will be trained on the entire training data. The batch size is the number of images that are processed at a time.

The evaluate() function takes two arguments: the test data and the labels. The function returns the loss and accuracy of the model on the test data.

The verbose argument controls the amount of output that is printed during training and evaluation. A value of 0 will suppress all output, a value of 1 will print a summary of each epoch, and a value of 2 will print detailed information about each epoch.

1. Import Libraries:

· from sklearn.metrics import accuracy_score: Imports the accuracy_score function from scikit-learn, which is used to calculate accuracy.

2. Predict and Evaluate:

· y_pred_probs = model.predict(x_test, verbose=0): Predicts probabilities for the test set using the trained model.

· y_pred = np.where(y_pred_probs > 0.5, 1, 0): Converts the predicted probabilities into binary predictions by thresholding at 0.5. If the predicted probability is greater than 0.5, it's considered as class 1; otherwise, it's class 0.

3. Calculate and Print Accuracy:

· test_accuracy = accuracy_score(y_pred, y_test): Calculates the accuracy between the predicted labels (y_pred) and the true labels (y_test).

· print("\nTest accuracy: {}".format(test_accuracy)): Prints the calculated test accuracy.

Please note that the threshold value of 0.5 for binary classification might need to be adjusted depending on your specific problem and dataset.

Also, ensure that x_test and y_test are properly prepared and preprocessed before using them for prediction and evaluation. Make sure that the shapes and formats of these arrays match the requirements of your model.

Lastly, you've imported matplotlib.pyplot library, but you haven't used it in this code snippet. If you intend to visualize something using matplotlib, you might need to include additional code for that purpose.

# Define a mask for selecting a range of indices (20 to 49)

mask = range(20, 50)

# Select the first 20 samples from the test set for visualization

X_valid = x_test[20:40]

actual_labels = y_test[20:40]

# Predict probabilities for the selected validation samples

y_pred_probs_valid = model.predict(X_valid)

y_pred_valid = np.where(y_pred_probs_valid > 0.5, 1, 0)

We are creating a mask to select a range of indices (20 to 49) and then using this mask to select a subset of the test set for visualization and evaluation. The code you've provided seems correct for this purpose. Here's a breakdown of what each part of the code does:

1. Define Mask:

· mask = range(20, 50): Defines a mask using the range function to select indices from 20 to 49.

2. Select Validation Samples:

· X_valid = x_test[20:40]: Uses slicing to select the first 20 samples from the test set (indices 20 to 39) for visualization.

3. Select Actual Labels:

· actual_labels = y_test[20:40]: Selects the corresponding actual labels for the selected validation samples.

4. Predict and Threshold:

· y_pred_probs_valid = model.predict(X_valid): Predicts probabilities for the selected validation samples.

· y_pred_valid = np.where(y_pred_probs_valid > 0.5, 1, 0): Converts the predicted probabilities into binary predictions using a threshold of 0.5.

The next steps could involve visualizing the selected samples along with their actual labels and predicted labels for comparison. You might also want to calculate and display other metrics to evaluate the performance of your model on this subset of the test set.

# Set up a figure to display images

n = len(X_valid)

plt.figure(figsize=(20, 4))

for i in range(n):

# Display the original image

ax = plt.subplot(2, n, i + 1)

plt.imshow(X_valid[i].reshape(28, 28))

plt.gray()

ax.get_xaxis().set_visible(False)

ax.get_yaxis().set_visible(False)

# Display the predicted digit

predicted_digit = np.argmax(y_pred_probs_valid[i])

ax = plt.subplot(2, n, i + 1 + n)

plt.text(0.5, 0.5, str(predicted_digit), fontsize=12, ha='center', va='center')

plt.axis('off')

# Show the plotted images

plt.show()

# Close the plot

plt.close()

The code you've provided is for visualizing a subset of the validation samples along with their predicted digits. This is a common way to visually assess how well your model is performing. Here's a breakdown of the code:

1. Set Up Figure:

· n = len(X_valid): n is set to the number of samples in your validation subset.

· plt.figure(figsize=(20, 4)): Sets up a figure with a specified size to display the images.

2. Display Original Images:

· A loop iterates through the range of n.

· ax = plt.subplot(2, n, i + 1): Creates a subplot to display the original image.

· plt.imshow(X_valid[i].reshape(28, 28)): Displays the original image, reshaped to 28x28.

· plt.gray(): Sets the colormap to grayscale.

· ax.get_xaxis().set_visible(False) and ax.get_yaxis().set_visible(False): Hides the axes for better visualization.

3. Display Predicted Digits:

· Another loop iterates through the range of n.

· predicted_digit = np.argmax(y_pred_probs_valid[i]): Determines the predicted digit by finding the index of the maximum value in the predicted probabilities.

· ax = plt.subplot(2, n, i + 1 + n): Creates a subplot to display the predicted digit.

· plt.text(0.5, 0.5, str(predicted_digit), fontsize=12, ha='center', va='center'): Displays the predicted digit as text in the center of the subplot.

· plt.axis('off'): Turns off the axis for this subplot.

4. Show and Close Plot:

· plt.show(): Displays the entire plot with the original images and predicted digits.

· plt.close(): Closes the plot.

This code will help you visualize how well your model is predicting the digits for a subset of validation samples. The top row shows the original images, and the bottom row displays the predicted digits.

A convolution layer is a fundamental component of the CNN architecture that performs feature extraction, which typically consists of a combination of linear and nonlinear operations, i.e., convolution operation and activation function.

A convolution layer plays a key role in CNN, which is composed of a stack of mathematical operations, such as convolution, a specialized type of linear operation.

Pooling Layer. Similar to the Convolutional Layer, the Pooling layer is responsible for reducing the spatial size of the Convolved Feature. This is to decrease the computational power required to process the data through dimensionality reduction

What is an activation function and why use them?

The activation function decides whether a neuron should be activated or not by calculating the weighted sum and further adding bias to it. The purpose of the activation function is to introduce non-linearity into the output of a neuron.

Explanation: We know, the neural network has neurons that work in correspondence with weight, bias, and their respective activation function. In a neural network, we would update the weights and biases of the neurons on the basis of the error at the output. This process is known as back-propagation. Activation functions make the back-propagation possible since the gradients are supplied along with the error to update the weights and biases.

Why do we need Non-linear activation function?

A neural network without an activation function is essentially just a linear regression model. The activation function does the non-linear transformation to the input making it capable to learn and perform more complex tasks.

Search This Blog

Knowledge is Power