In the previous blog post, we learnt how to build a multilayer neural network in Python. What we did there falls under the category of supervised learning. In that realm, we have some training data and we have the associated labels. Now the goal is to train the neural network correctly label our training data. Once we train the model, we can use it to predict the labels of unknown datapoints. But what about unsupervised learning? In the real world, we also have to deal with a lot of unlabeled data. Can we train a neural network to recognize clusters in our data? Yes, we certainly can! Let’s go ahead and see how we can do that in Python, shall we?
Unsupervised learning using neural networks
Unsupervised learning refers a category of machine learning algorithms that deal with unlabeled data. All we know is that the data can be divided into a certain number of groups. We don’t have any labels and our system has to work with this constraint. One of the most common examples is clustering. Now, how do we train our neural network to recognize these clusters?
The technique we use for this purpose is called Competitive Learning. It is a way to train our neural network in which the neurons compete for the right to respond to our input datapoints. We train the network in such a way that only one neuron will fire at a time. This strategy is called “winner take all”. The layer of neurons that implement this strategy is called a “competitive layer”. Note that this layer can also be used in conjunction with other layers in a bigger neural network whose purpose may or may not be to perform clustering.
How to do it in Python?
Let’s go ahead and see how we can perform clustering using neural networks. We need to create some dummy data to play around with:
import numpy as np centroids = np.array([[1, 2, 1], [2, 5, 2], [-3, 3, 0], [5, -1, -1], [-2, -2, -2]])
These are the 3-dimensional points that will act as our centroids. We will generate data centered on these points.
num_centroids = len(centroids) dimensionality = len(centroids) num_datapoints = 100 gaussian_distribution = 0.8 * np.random.randn(num_datapoints, num_centroids, dimensionality) input_datapoints = np.array([centroids + x for x in gaussian_distribution]) input_datapoints.shape = (num_datapoints * num_centroids, dimensionality) np.random.shuffle(input_datapoints)
We generated a Gaussian distribution and then created datapoints by adding points from this distribution to our centroids. Let’s plot these points:
import pylab as pl from mpl_toolkits.mplot3d import Axes3D fig = pl.figure() ax = fig.add_subplot(111, projection='3d') ax.plot(input_datapoints[:,0], input_datapoints[:,1], input_datapoints[:,2], 'b.') pl.show()
When you run this code, you will see this:
You can hold and drag it to see it from different viewpoints. You can see that there are 5 different clusters of points. The next step is to prepare this data for training. We need to normalize this data:
normalization_factor = np.linalg.norm(input_datapoints) input_datapoints_norm = input_datapoints / normalization_factor
This data is now ready to be used.
Training the neural network
Let’s create a single-layer neural network with 3 inputs (3 = dimensionality of the input datapoints) and 5 neurons (5 = number of clusters):
import neurolab as nl neural_net = nl.net.newc([[0, 1] for _ in range(dimensionality)], num_centroids)
Here, [0, 1] indicates that the input data lies between 0 and 1. At this point, we only have the datapoints and the input data doesn’t have any information about the centroids. We want the network to be able to automatically segment these datapoints into 5 clusters and then recognize these centroids for us. Let’s train the network:
error = neural_net.train(input_datapoints_norm, epochs=100, show=50)
Let’s see what the network predicted:
predicted_centroids = neural_net.layers.np['w'] predicted_centroids = predicted_centroids * normalization_factor
If you look at the values of ‘predicted_centroids’, you can see that they are close to the actual centroids. Let’s plot them:
fig = pl.figure() ax = fig.add_subplot(111, projection='3d') ax.plot(input_datapoints[:,0], input_datapoints[:,1], input_datapoints[:,2], 'c.') ax.plot(centroids[:,0], centroids[:,1], centroids[:,2], 'r*', markersize=12) ax.plot(predicted_centroids[:,0], predicted_centroids[:,1], predicted_centroids[:,2], 'kh', markersize=8) pl.legend(['datapoints', 'actual centroids', 'predicted centroids']) pl.show()
If you run the above code, you will see this:
If you increase the number of epochs to 1000, you will see this:
In this case, the predicted centroids are closer to the actual centroids. You can play around with the 3D plot. If you rotate it, you can see that there are distinct clusters and the predicted centroids are very close to the actual centroids. You are all set! You just trained a neural network to perform clustering.