Topic 22 | Neural Networks |
AI Free Basic Course | Lecture 22 | Neural Networks
In today
lecture we will continue the Neural Network; in last lecture we gave the basic
idea about what is to expect from neural network. Today we will explain the
hyper parameters and its components. We will see the affect of the hyper
parameter on the neural network. We will use playground and practice on
notebook. At the end of lecture, we will implement a basic network using Tensor
Flow. Today we will go through the things and would discuss the notebook in
detail in tomorrow’s lecture that what does construct means in dense of Tensor
flow, what does the layers do and what is loss function. What are possibilities
to add in hidden layer and output layer.
After
practicing in TensorFlow you will get better idea about things. So, we will
implement a neural network in TensorFlow and run it to see the outcome.
Suppose we
have few balls and our target is to through them in a basket placed at a
distance of 10 meter. When we through the first ball, there is possibility that
either it will fall next to basket or before the basket. After throwing the
second ball you will change the trajectory, the angle will be adjusted to set
the parameter accordingly so that the ball may fall into the basket. By
bringing the small changes in angle of trajectory the ball fall near the basket
and in case you increase the angle of the ball it will fall far away from
basket. The changes we are making in value of trajectory is a parameter. The
basket at the distance of 10 meters is a fixed parameter. Number of attempts
you make is also a parameter which is not being changed. But there is a
trajectory parameter which is being adjusted.
The
parameters we change while training neural network are called hyper parameters
and the adjustments, we make to improve the trajectory to make it possible to
fall the ball in the basket is called learning rate. The special thing about
hyper parameters is that you can’t learn it from data. Parameter are the
internal values of neurons that are being learnt from data during the training
of neurons. While hyper parameters are something related to common sense. These
are learnt from trial-and-error technique instead from data. As quoted in
example of throwing ball, the trajectory we decide is not learnt from the available
data, it is the common sense that will help to decide the level of trajectory,
in the same way while designing computer vision, it is the common sense that
would help us to decide which parameters are to select for the purpose.
Similarly different parameters would be required to make classification of logs
and make Natural Language Processing (NLP). So, we can say that the choosing
hyper parameters is an art rather than science.
For example,
the residents of a desert or forest, while travelling can sense the dangerous
areas on their way and can adjust their route and timing accordingly. This
sense has developed in them after spending time in that area and facing
different situations time and again. The people new in the area would not have
that sense. So, after making the neural networks again and again a sense would
be developed which would guide us that which hyper parameter would produce
better results. So, remember the parameters are learnt from data and hyper
parameters are learnt from trial and error and experience. Learning rate in example
quoted above is also a hyper parameter.
Following
are the possible parameters in the example of basketball:
- The
force with which we through the ball
- The
height of the ball
- The
distance between us and the basket
- The height of the basketball
pole although fixed is also a parameter
In
first attempt, if we throw the ball with certain force and height and if the
trajectory built at the result did not produce the desired results, we will
gradually adjust our force and height time and again based on our previous
experience, to achieve the target of throwing the ball in the basket. So, it
depends upon what is our ultimate goal and which hyper parameters would be
required to achieve those goals.
Remember
don’t get confuse while discussing the hyper parameters with the penalty we
studied in reinforcement. When concept of fine tuning or penalty is applied on that
model which has already been trained and
we have limited access to further play with it. The model will now easily learn
in case new data is giving to it for training it.
There
is difference when we teach the history to a person who is masters in English
and the person who is illiterate. To teach history to an illiterate person
first of all he has to learn English first then would be able to gain the
knowledge of history. So fine tuning is like teaching a literate person who
will not take much time and effort to learn new skills. Training a neural
network is like teaching from scratch to a person who has no previous background and it would take
much more time and effort to teach him the new skills.
Gradient
Descent terminology comprises two words, Gradient means the slope and Descent
means downwards. For example, stepping down from a roof is an example of
Gradient Descent.
Now we
discuss the relation between the learning rate and the Gradient Descent and
explain it with the help of an example. Suppose you are standing on the peak of
a mountain with your eyes covered with the ribbon and your goal is to reach the
lowest point. The peak of mountain is the point from where we start our neural
network to train and it has maximum value of loss. Our target is to move
towards a point which has the minimum value of loss. However, we don’t know how
to reach there as we can’t see due to ribbon on our eyes. One way to reach there
is that you must be a superman so that you many not hurt if slipped during your
journey to moving downward. Other practical solution is that you start moving
randomly and choose that way where you feel that you are moving down. However,
you also have to decide the pace with which you decide to move downwards.
Either you choose to jump like kangaroo or move slowly with small steps until
you reach at your desired destination. When you jump like kangaroo there is
possibility that you may not know about the desired destination and you start
to move up towards another mountain after reaching on the ground. As your goal
is to reach on the ground and not to climb up towards another mountain. Your
loss in this case will start increasing.
If you
climb up towards another mountain and realize that you are moving upward and
you jump to come down but instead of reaching ground you reach another point on
the previous mountain. However, while completing the journey in small steps and
with slow pace this will not happen. In this case as soon as you step up after
reaching on the ground you realize about gaining height and you immediately
step back to reach the desired point on the ground.
The size of
jump or step you take to reach at your desired point on the ground is called
learning rate.
Let us
explain it with another example. Suppose you have a rod of 3 feet to measure
the distance and the total distance is of 11 feet. Remember your eyes are
covered with a ribbon. What will happen after taking four steps of 3 feet you
will cover the distance of 12 feet and will cross you desired point by one
foot. However, when we cover the distance with the help of rod that is 1 foot
long. What will happen now, you will stop after measuring 11 times and will not
lose your desired point.
In order to
make the learning process easy, we divide the syllabus in to small parts and
take test of students step by step. Instead of giving the huge amount of data
or images to the model the model is given with small chunks of data, this
process of dividing the huge amount of data is called batching.
Batch size
is also a parameter. If the students are intelligent then large part of
syllabus will be given to the student. In this case less time will be consumed.
Epoch is like a whole book. We can divide it into chapters for training the
model.
So far, we
have discussed following hyper parameters:
- Learning
rate
- Batch
size
- Number of Epochs
Look at the
picture already discussed. Every neuron in this picture will make decision on
the basis of data given to it. The value of each decision taken by each neuron
is passed to the neuron next to it. The next neuron on the basis of decision
value provide to it will add another decision value and pass it to neuron next
to it. So, the neurons in the first layers comprise on simple values that will
get complicated layer by layer. The neuron at the ultimate end will be the
super neuron that will make decision on the basis of values added by all
neurons. Simply we can say that it will decide whether the image given is cat
or dog.
Remember
the concept of hidden layer and output layer must be clear in your mind. The decision-making
neurons are called output neurons and the input processing neurons are called
hidden neurons and the neurons giving the input are called input neurons. So,
in neural network there are three types of layers.
We
have to decide which type of neurons should be used at each type of layer.
Suppose we decide that at hidden layers we would always use the RELO, the
popular max Function. If your output neurons have to make decision on binary
classification, we would use sigmoid at output neurons. We used this during discussing
about the logistic regression.
After binary
classification now move towards more than two types of inputs out of which the
model has to identify. Suppose we have three shapes, square, triangle and circles
as input out of which model has to identify which type of shape it is. Now we
have to go beyond the binary classification of Yes and No done by the output
layer so far. In order to identify the items of more than two types we will use
SoftMax. We can say that SoftMax is the council of judges which assign values
to each out put as per their knowledge. The maximum value giving by any judge will
be our output. Number of Neuron in SoftMax will depend upon the number of classes
in input. Suppose if there are three classes the number of SoftMax neurons
would also be three.
Comments
Post a Comment