Knowledge is Power

AI Free Basic Course | Lecture 26 | AutoEncoders | Live Session

Press release of Upwork has been pinned on our social media platforms. Upwork is the biggest online earning platform. If you want to take the opportunity to earn from the million-dollar projects, to reinvest them to generate and share idea, for all these Upwork is playing key role for promotion of all these.

When started this online learning program as we have predicted the direction of online earning. We have apprehended what the world is demanding in the field of online earning and what our generation is doing. We are still focusing on Canva, WordPress etc. Although it is not bad thing to work in these areas, but to work in the limited scope does not mean we don’t have to pursue for the growth, especially when the trends are changing. It is our duty to plan and hard work and leave the rest to ALLAH for reward of hard work.

If we don’t adapt ourselves to the changing demands of the world and we fail in the end. What if, if we don’t qualify for the demanded projects and could not get work. Who is to blame then? Our fate or ourselves!

According to press release of upwork the Top 10 generative AI-related searches from companies, January 1 - June 30, 2023: are given below:

ChatGPT
BERT
Stable Diffusion
Tensor Flow
AI Chatbot
Generative AI
Image Processing
PyTorch
Natural Language Processing (NLP)
Bard

At the top is ChatGPT. What in GPT is being demanded? There are lot of things in ChatGPT. It can be content writing, coding, Barts etc. Wherever the Barts are being used, its link with the ChatGPT is essential. Bart has different layers which will be taught in coming lectures.

BART, which stands for Bidirectional and Auto-Regressive Transformers, is a language model developed by Facebook. BART is known for its ability to generate high-quality natural language text, as well as its ability to perform well on a range of natural language processing tasks.

BERT is at second. Stable Diffusion is at third. Realizing its importance, we have given sufficient time, at least three days to the Stable Diffusion. Tensorflow which was also given due importance in our lectures is at fourth position.

Chat bot at number 5 is topic of our tomorrow lecture. Generative AI at position 6 is the combination of all 5 discussed above. At position 7 is image processing. Do you remember our pink elephant on the road?

So, we have planned our lessons keeping in view AI-related searches from companies. We taught neural network along with Convolutional Neural Network (ConvNet/CNN. Image Processing has also been discussed in lectures related to hugging face. PyTorch is part of our coming lectures. NLP along with its complete pipeline has also been discussed in ChatGPT and hugging face.

Bard is at position 10. We have stated our online lectures from Bard. Additional lectures were also uploaded on the topic.

Moreover, I am going to claim that the queries that will be included in generative AI-related searches from companies in future are also part of our study program. So there is need to realize the importance and get knowledge of these concepts. If you fail to do this, what will happen eventually? Depression and anxiety will creep in our lives due to not getting the projects. We will be hopeless and give up at the end. We then look for the cheap shortcuts and as result of which crimes evolve in the society. All this is the result of non-planning not apprehending not foreseeing and not managing things at proper time.

All the discussion until now shows that we are going on track and our study plan is to the point. This is the answer to those question constantly being asked during the lectures that why we have chosen this course outline. As we are in the industry and have planned the study plan accordingly.

Now we move towards our topic.

Today we are discussing Auto Encoders. Auto encoder is the type of neural network.

Artificial Intelligence encircles a wide range of technologies and techniques that enable computer systems to solve problems like Data Compression which is used in computer vision, computer networks, computer architecture, and many other fields. Autoencoders are unsupervised neural networks that use machine learning to do this compression for us.

Auto Encoder on the lighter note is the grandfather of ChatGPT and BERT. As all these models are descendants of Auto Encoder. Let us understand it how?

Auto Encoder structure is developed on two parts. One part is called Encoder and the second part is called Decoder. BERT is collection of Encoders. When transformers of encoders are linked together and learn representations from them. There are lot of encoders working together. So that’s is why I am saying that BERT is actually a descendent of Auto Encoder. Similarly, when we utilize ChatGPT, there are lot of Decoders working together.

Let us understand this phenomenon from the daily life example. Suppose a special guest arrives at hotel and manager is assigned to take care of him. The guest is asked to provide the list of all food items he needed at breakfast lunch and dinner so that arrangements could be made accordingly. What happened here? Large amount of data was given to the manager by the guest to perform certain action. The detailed instructions were given on the day first.

What happens next day, the guest just asks the same list of food items be followed. The large input data has been reduced in this case, however the output would remain same as was on the day first in response to the detailed input of instructions.

But remember on day 2, only that person would understand the short input who was already given the detailed input on day 1.

Similarly, if we have large image. Instead of giving to the model this image that comprises large numbers (pixels), we encode it in small part and give it to the model to generate the same output that would have been produced if we have given the model the large image without encoding. This would increase the speed of processing; less computation power resultantly increases in the overall efficiency.

In our childhood while drawing a sketch by looking at image, we learn the prominent features of that image and draw the image that looks like original image. Now if we want computer to perform this similar job, the computer must be provided with the neural network with compressed input enabling it to draw the image from the minimal features provided to it. In absence of neural network, the computer would not be able to draw that image from the image provided to it as input as much computational power would be required to do that.

Minimum information that is essential to generate the image from input image below which the model would not be able to do that.

An autoencoder neural network is an Unsupervised Machine learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. Autoencoders are used to reduce the size of our inputs into a smaller representation. If anyone needs the original data, they can reconstruct it from the compressed data.

We have a similar machine learning algorithm ie. PCA which does the same task. Principal Component Analysis (PCA) is one of the most commonly used unsupervised machine learning algorithms across a variety of applications: exploratory data analysis, dimensionality reduction, information compression, data de-noising, and plenty more.

So, you might be thinking why do we need Autoencoders then? Let’s find out the reason behind using Autoencoders.

Emergence of Autoencoders:

Autoencoders are preferred over PCA because:

An autoencoder can learn non-linear transformations with a non-linear activation function and multiple layers.
It doesn’t have to learn dense layers. It can use convolutional layers to learn which is better for video, image and series data.
It is more efficient to learn several layers with an autoencoder rather than learn one huge transformation with PCA.
An autoencoder provides a representation of each layer as the output.
It can make use of pre-trained layers from another model to apply transfer learning to enhance the encoder/decoder.

Examples of CNN mentioned above are of very basic level applications.

Look at the picture above, you can see the image segmentation. Through CNI we have learnt how to learnt to classify and identify the image and detect a specific object in it. A step forward we started to get information about each and every pixel. Suppose I am looking for a car in the image, then the we can differentiate only those pixels that related to the car from the whole image. Similarly, we can differentiate only those pixels that relate to road or tree specifically. of the image. So, what we are doing here? We are segmenting or separating multiple objects from an image based on their images. All this became possible due to CNN, that we have learnt so far.

What is image Segmentation, it is detection and classification of images learnt through CNN.

Look at the image above, we can see multiple images of cars, trucks and poles. The model is identifying the objects as per their identity. It is identifying cars as cars and truck as trucks. Even it identifies the poles exactly whether they are traffic poles or Electric poles. The autonomous cars are the real-life examples.

These cars identify the boundary of the road by segmentation the pixels of road from the whole image. These cards adjust their lanes by identifying the pixels of road lanes, poles of signal and surround vehicles. CNN has made us capable of doing all these tasks.

Similarly, the model identifies different hand gestures with the help of CNN.

The model is recognizing the faces of more than 5000 persons which can’t be recognized by the human eye.

All this happened due to the progress the AI models made day by day.

Beside this there are other fields as well, like Medical Field. X-ray(s) before AI were examined by doctors and diseases were diagnosed from scanning the x-rays. Now the x-ray is passed to the model for prediction of different health related problems. Similarly, CNN is being used in many other applications of health.

After this a technology evolved about which we are much familiar now a days. The technology is called generative AI. We generate image and text of our own choice.

Let us think about the inception of Generative AI Model. At the start, our aim was to generate an image from the neural network. We gave an image to the neural network model and the neural network model examine, understands that image and in return generates another image itself which is similar to the input image. This was the basic step towards our journey towards generative AI.

The structure used in development of generative AI model is called autoencoder.

Now let’s have a look at a few Industrial Applications of Autoencoders.

Image Coloring

Autoencoders are used for converting any black and white picture into a colored image. Depending on what is in the picture, it is possible to tell what the color should be.

Feature variation

It extracts only the required features of an image and generates the output by removing any noise or unnecessary interruption.

Dimensionality Reduction

The reconstructed image is the same as our input but with reduced dimensions. It helps in providing the similar image with a reduced pixel value.

Denoising Image

Let us generalized the idea. You have seen lot of time that an old and torn image is shared on the Facebook and asked to refine it. A photoshop expert comes forward and rectify the defects in that old picture with the help of Photoshop tools and techniques.

Above stated job can be performed easily through Autoencoder. Similarly, a black and white image can also be changed into color image through autoencoder.

Now question arises, if the Photoshop can perform the same tasks of rectification of the defects, then why AI? Before answering this question, we must first understand the purpose of AI, that is to reduce the human dependency. Everyone can use AI as tool to perform the same job which can be performed through Photoshop by only those who are expert in it. So, we must know that purpose of AI is to empower you and make you productive. AI is smart enough to detect what is extract from the image and what not to give the clean image as input.

In the above picture we have a noisy image, we encoded it, decoded it and got the clear picture at the end. It actually regenerated the noisy image to make it a clear image. Applications of autoencoder denoiser will astonish you. These are to clear the blur image, to color the black and white image, clear the noisy image and countless other applications in computer vision. Remember we are just discussing about computer vision and not touched the its application in the field of languages

The input seen by the autoencoder is not the raw input but a stochastically corrupted version. A denoising autoencoder is thus trained to reconstruct the original input from the noisy version.

Watermark Removal

It is also used for removing watermarks from images or to remove any object while filming a video or a movie.

Now that you have an idea of the different industrial applications of Autoencoders, understand the complex architecture of Autoencoders.

Remember input encoder is different from pipeline, in pipeline there is sequence of commands and actions while in input encoder there is single command.

ARCHITECTURE OF AUTOENCODERS

Autoencoder is an architecture which can be developed through convolutionary neural network, fully connected neural network or recurrent neural network or attention layer. So, we can use this architecture in many applications according to our requirements.

As discussed, stable diffusion is actually the latent diffusion and it is strength of autoencoder that it learns latent representation. It is like conversion of 784 numbers into 10 number representation. Latent means hidden, when we compress or encode the image it is called latent.

An Autoencoder consist of three layers:

Encoder
Code
Decoder

Encoder: This part of the network compresses the input into a latent space representation. The encoder layer encodes the input image as a compressed representation in a reduced dimension. The compressed image is the distorted version of the original image.
Code: This part of the network represents the compressed input which is fed to the decoder.
Decoder: This layer decodes the encoded image back to the original dimension. The decoded image is a lossy reconstruction of the original image and it is reconstructed from the latent space representation.

Wherever the concept of Encoders and Decoders is used in computer science it means the we compress the data in meaningful way and assign a key to it so that nobody could understand it except the holder of that key. When the person who has key reads that compressed data, the process of reading that data is called decoding.

Our aim is to generate images and text. But before generating images or text, We must have a model which we have trained to perform that job.

We can say that input encoder is such type of neural network that predicts its own output. Now question arises in presence of input, why there is need to predict output? Why this input is not converted to output directly?

In the last lecture, we took the image from amnist data and converted 28 x 28-pixel data into single layer 784-dimensional vector upon which model was trained.

Autoencoder has two parts, Encoder and Decoder. In the same case the Encoder part of Autoencoder will convert that single layer 784-dimensional vector into 10 parts. The function of Encoder is to compress the larger information into less information. Our goal is not to lose any information during the process of compression. So, nothing is lost in the process.

The Decoder part again expands that compressed part into 784 dimensional vectors. It means at the encoder part all information of 784 numbers was encoded in ten number representation. This ten number representation is sufficient to represent all those 784 numbers. The decoder will again recreate 784 numbers from input of ten number representation. So, we recreated the output without loss of any information. During the process of recreation, we learnt a hidden representation which is not only compressed but also not lost any information during the process of compression. So, the larger image is compressed at encoder stage and then transmitted to output layer which recreates the compressed image into larger image.

Autoencoder is applied in denoising, image classification, variational auto encoders and lot of other application like deep learning neural network.

The task of researching the applications of the autoencoders was assigned during the lecture.

The layer between the encoder and decoder, ie. the code is also known as Bottleneck. This is a well-designed approach to decide which aspects of observed data are relevant information and what aspects can be discarded. It does this by balancing two criteria:

Compactness of representation, measured as the compressibility.
It retains some behaviourally relevant variables from the input.

Convolution Autoencoders

Autoencoders in their traditional formulation does not take into account the fact that a signal can be seen as a sum of other signals. Convolutional Autoencoders use the convolution operator to exploit this observation. They learn to encode the input in a set of simple signals and then try to reconstruct the input from them, modify the geometry or the reflectance of the image.

At our first layer at the left we ignore the output and transmit the input to next layer after processing it. Similarly at the next layer this output of previous layer is further processed as input and transmitted to next layer until the encoder of the model is trained. At the decoder layer we take care of output as per neural network.

What does mean processing the input data at encoder stage? Processing here means extracting the features and reducing the size of the input data.

At the end of encoder there is flattened or one-dimensional vector layer of 1152 size.

A one-dimensional vector of 10 size also exists after this layer which shows that image has been reduced to its minimum size.

Now this minimum size data is passed to next layer which is the first layer of decoding stage. Now we see that it is working in opposite way. Now FC layer is the first layer where as it was at the last in encoding process.

We see at the end of decoder layer an output image is generated which is similar to the input page. It has taken the features or information from the input image and has generated another image which is similar to the input image.

We can simplify it as from image to vector and from vector to image.

Use cases of CAE:

Image Reconstruction
Image Colorization
latent space clustering
generating higher resolution images

Now that you have an idea of the architecture of an Autoencoder. Let’s continue our Autoencoders Tutorial and understand the different properties and the Hyperparameters involved while training Autoencoders.

Properties of Autoencoders:

Data-specific: Autoencoders are only able to compress data similar to what they have been trained on.

Lossy: The decompressed outputs will be degraded compared to the original inputs.

Learned automatically from examples: It is easy to train specialized instances of the algorithm that will perform well on a specific type of input.

Hyperparameters of Autoencoders:

There are 4 hyperparameters that we need to set before training an autoencoder:

Code size: It represents the number of nodes in the middle layer. Smaller size results in more compression.
Number of layers: The autoencoder can consist of as many layers as we want.
Number of nodes per layer: The number of nodes per layer decreases with each subsequent layer of the encoder, and increases back in the decoder. The decoder is symmetric to the encoder in terms of the layer structure.
Loss function: We either use mean squared error or binary cross-entropy. If the input values are in the range [0, 1] then we typically use cross-entropy, otherwise, we use the mean squared error.

Now that you know the properties and hyperparameters involved in the training of Autoencoders. Let’s move forward and understand the different types of autoencoders and how they differ from each other.

Sparse Autoencoders

Sparse autoencoders offer us an alternative method for introducing an information bottleneck without requiring a reduction in the number of nodes at our hidden layers. Instead, we’ll construct our loss function such that we penalize activations within a layer.

Deep Autoencoders

The extension of the simple Autoencoder is the Deep Autoencoder. The first layer of the Deep Autoencoder is used for first-order features in the raw input. The second layer is used for second-order features corresponding to patterns in the appearance of first-order features. Deeper layers of the Deep Autoencoder tend to learn even higher-order features.

A deep autoencoder is composed of two, symmetrical deep-belief networks-

1. First four or five shallow layers representing the encoding half of the net.

2. The second set of four or five layers that make up the decoding half.

Use cases of Deep Autoencoders

Image Search
Data Compression
Topic Modeling & Information Retrieval (IR)

Contractive Autoencoders

A contractive autoencoder is an unsupervised deep learning technique that helps a neural network encode unlabeled training data. This is accomplished by constructing a loss term which penalizes large derivatives of our hidden layer activations with respect to the input training examples, essentially penalizing instances where a small change in the input leads to a large change in the encoding space.

Now that you have an idea of what Autoencoders is, it’s different types and it’s properties. Let’s move ahead with our Autoencoders Tutorial and understand a simple implementation of it using TensorFlow in Python.