Deep Learning With Caffe In Python – Part IV: Classifying An Image

4 mainIn the previous blog post, we learnt how to train a convolutional neural network (CNN). One of the most popular use cases for a CNN is to classify images. Once the CNN is trained, we need to know how to use it to classify an unknown image. The trained model files will be stored as “caffemodel” files, so we need to load those files, preprocess the input images, and then extract the output tags for those images. In this post, we will see how to load those trained model files and use it to classify an image. Let’s go ahead see how to do it, shall we?  

Loading the model

Training a full network takes time, so we will use an existing trained model to classify an image for now. There are many models available here for tasks such as flower classification, digit classification, scene recognition, and so on. We will be using the caffemodel file available here. Download and save it before you proceed. Open up a new python file and add the following line:

net = caffe.Net('/path/to/caffe/models/bvlc_reference_caffenet/deploy.prototxt',
'bvlc_reference_caffenet.caffemodel', caffe.TEST)

This will load the model into the “net” object. Make sure you substitute the right path in the input parameters in the above line.

Preprocessing the image

Let’s define the transformer:

transformer ={'data': net.blobs['data'].data.shape})

The function of the transformer is to preprocess the input image and transform it into something that Caffe can understand. Let’s set the mean image:

transformer.set_mean('data', np.load('/path/to/caffe/python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1))

The mean image needs to be subtracted from each input image. A couple of other params:

transformer.set_transpose('data', (2,0,1))
transformer.set_channel_swap('data', (2,1,0)) # if using RGB instead of BGR
transformer.set_raw_scale('data', 255.0)

The “set_transpose” function here will transform an image from (256,256,3) to (3,256,256). The “set_channel_swap” function will change the channel ordering. Caffe uses BGR image format, so we need to change the image from RGB to BGR. If you are using OpenCV to load the image, then this step is not necessary since OpenCV also uses the BGR format. The “set_raw_scale” function normalizes the values in the image based on the 0-255 range.

We need to reshape the blobs so that they match the image shape. Let’s add the following line to the python file:


The first input parameter specifies the batch size. Since we are only classifying one image, it is set to 1. The next three parameters correspond to the size of the cropped image. From each image, a 227×227 sub-image is taken for training in the model file that we loaded. This makes the model more robust. That’s the reason we are using 227 here!

Classifying an image

Let’s load the input image:

img ='myimage.jpg')
net.blobs['data'].data[...] = transformer.preprocess('data', img)

Let’s compute the output:

output = net.forward()

The predicted output class can be printed using:

print output['prob'].argmax()

Download this file before you proceed. It contains the mapping required for the labels.Let’s print all the predicted labels:

label_mapping = np.loadtxt("synset_words.txt", str, delimiter='\t')
best_n = net.blobs['prob'].data[0].flatten().argsort()[-1:-6:-1]
print label_mapping[best_n]

The above lines will print all the labels predicted by the CNN. You will get the following output:

['n03837869 obelisk' 'n03743016 megalith, megalithic structure'
 'n02793495 barn' 'n03028079 church, church building' 'n02980441 castle']

Isn’t it awesome? Looks like we are all set! During the course of these four blog posts, you learnt how to define, visualize, train, and run a CNN using Caffe. Keep playing with it and you’ll see that it can be used to perform a wide variety of computer vision tasks!


11 thoughts on “Deep Learning With Caffe In Python – Part IV: Classifying An Image

  1. Thank for the post. I have a question that if I create custom python layer in caffe then would its computation take place on gpu or it would only run on cpu? Is it necessary to implement a layer in C++ and CUDA to make use of gpu?

    1. Custom Python layer works only on he CPU. Data will be copied to CPU in order to apply that layer. Hence having the Python layer as the first or last layer will reduce the impact on performance.

      To use both CPU and GPU you have to add .cpp and .cu files for that layer.

  2. when i run these script ..i got this error , to solve this ..?

    Traceback (most recent call last):
    File “”, line 8, in
    ‘bvlc_reference_caffenet.caffemodel’, caffe.TEST)
    RuntimeError: Could not open file bvlc_reference_caffenet.caffemodel

  3. What a brilliant tutorial, Prateek Joshi. Thank you very much. I’d like a suggestion from you. Considering a problem of “finding” a specific sort of grain/seed in a image of a ground full of land, other sort of grains, inserts, etc, that is, can I work with Caffe as well? In the case of objects with very particular characteristics, what can I use? Caffe as well? Thank you a lot.

  4. Thank you Prateek Joshi for this usefull tutorial. Can I ask you a suggestion? I am wondering whether I can use Caffe for the problem of “finding” a specific grain/feed in the ground. For example, I have pictures from a ground full of land, other grains, plants, insets, etc, and I would link to “find” a specific grain in that pictures. Is is viable using Caffe? For objects with peculiar characteristics it is possible using Caffe as well? Thank you a lot.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s