In the previous blog post, we discussed about perceptrons. We learnt how to train a perceptron in Python to achieve a simple classification task. If you need a quick refresher on perceptrons, you can check out that blog post before proceeding further. In a way, perceptron is a single layer neural network with a single neuron. In this blog post, we will learn how to develop a multilayer neural network. A multilayer neural network consists of multiple layers and each layer consists of many perceptrons, and it is much better at classifying data that a single perceptron. So how exactly does a multilayer neural network function? How do we build it in Python?
What is a multilayer neural network?
To understand multilayer neural networks, let’s consider the following figure:
As we can see here, this network consists of one input layer, two hidden layers, and one output layer. The hidden layers are not exactly “hidden”, it’s just the nomenclature. Any layer that’s not input or output is called “hidden”. Each perceptron in the first layer takes all the inputs and makes a decision. The perceptrons in the next layer take the outputs of these perceptrons in the first layer and take decisions. Similarly, the perceptron in the output layer takes a decision based on the outputs of the perceptrons in the second hidden layer to get the final result.
How do we train a multilayer neural network?
Each perceptron in the network has a set of weights and a bias associated with it. Wait a minute, what does that mean? Well, as we discussed in the previous blog post, each perceptron has a weight associated with each input. We need to weigh each input and compute the weighted sum. We can then threshold it to decide the output. Now when we say we want to “train” a perceptron, we basically want to find out the right set of weights and the threshold (also known as bias) that will produce the desired output. When we want to train a neural network with many perceptrons, we need to tune the weights and biases for all the perceptrons until we get the desired result.
By forcing the perceptron output to 0 or 1, aren’t we losing information? Doesn’t that make our neural network less optimal? Well, that’s true! To overcome that, people introduced sigmoid neurons. It’s basically the same as a perceptron, except that the output is a real number between 0 and 1. This way, the neurons in the network are free to learn everything about the input data. Sigmoid neurons are actually very interesting and we will discuss more about them in the next blog post. We will also discuss the training algorithms used to find the associated weights and biases.
Let’s do it in Python
Now that we know a little bit about training a neural network, let’s see how to build a neural network in Python. Let’s quickly generate some training data:
import numpy as np import pylab as pl min_value = -11 max_value = 11 x = np.linspace(min_value, max_value, 40) y = np.square(x) + 5 y = y / np.linalg.norm(y) num_samples = len(x) input_data = x.reshape(num_samples, 1) output_labels = y.reshape(num_samples, 1) pl.figure(0) pl.plot(x, y, '.') pl.show()
If you run the above code, you will see that it’s a simple parabola as shown in the figure here. An important thing to note is that we have normalized the output values. This is a very important step when you are preparing the training data for neural network. Always make sure your labels are normalized. Let’s create a neural network with 1 hidden layer. Technically, the output values are also considered as “layers”, so we need to create a network with two layers here:
import neurolab as nl multilayer_net = nl.net.newff([[min_value, max_value]], [10, 1])
Let’s set the training algorithm to gradient descent and train the network:
multilayer_net.trainf = nl.train.train_gd error = multilayer_net.train(input_data, output_labels, epochs=500, show=100, goal=0.03)
The number of epochs refers to the number of iterations of the full training dataset that the network would go through before it stops. The “show” parameter is just related to showing the progress every 100 epochs. The “goal” parameter specifies the maximum permissible error. Once the error goes below this value, the network stops training even if the maximum number of epochs is not reached. As you can see in this case, the error didn’t converge after 500 epochs. To overcome this, let’s add another hidden layer:
multilayer_net = nl.net.newff([[min_value, max_value]], [10, 10, 1])
The above network consists of two hidden layers with 10 neurons each. When you make this change, you will see that the maximum error satisfies our constraint. Let’s predict the output for the training inputs and see how the network performs:
predicted_output = multilayer_net.sim(input_data) x2 = np.linspace(-11, 11, 80) y2 = multilayer_net.sim(x2.reshape(x2.size,1)).reshape(x2.size) y3 = predicted_output.reshape(num_samples) pl.figure(1) pl.subplot(212) pl.plot(x2, y2, '-', x, y, '.', x, y3, 'p') pl.legend(['train target', 'network output']) pl.show()
If you run the above code, you’ll see the following figure:
As you can see, it’s close but not close enough! Let’s reduce the maximum allowed error to 0.001 and the number of epochs to 1000:
error = multilayer_net.train(input_data, output_labels, epochs=1000, show=100, goal=0.001)
If you run the code, you will see the following figure:
The predicted outputs are much closer to the actual output in this case. We are all set! You just trained a multilayer neural network to predict the outputs for your training data.
————————————————————————————————————————–
I am interested in knowing , how to decide the value of number of hidden layers and number of neurons in each hidden layer, I am finding it very difficult to decide for my academic project which has 38 features .(i,e 38 Input Neurons) and it is classification problem so 1 output neuron.
Hi Puneet,
It heavily depends on the problem at hand and the computational constraints. Generally, if you increase the number of hidden layers, you are giving more freedom to your network to train (hence you will get better performance). At the same time, it will also increase the amount of time it takes to train your network. So it’s a trade off! The number of output neurons depends on the number of classes in your training data. For example, if your training data has 3 classes, then you should have 3 output neurons.
Thank you!, I have one more thing to clarify, as of now, i am not using neurolab or sklearn framework for the development. Instead i did manage to write code that came from the basic derivation that we derive for finding the error and updating the weights and biases.
Since i can feel the how feedback is propagated back to all the layers from the output layer.
But the issue with this is, Outputs between the execution vary a lot!!
Any Idea? why it is so?