Deep Learning With Caffe In Python – Part III: Training A CNN

3 mainIn the previous blog post, we learnt about how to interact with a Caffe model. In this blog post, we will learn how to train a proper CNN. Up until now, we were dealing with a single layer network. We just defined it in a prototxt file and visualized it easily. If we want our CNN to perform any meaningful tasks, we should define a multilayer network and allow it to train on a large amount of data. Caffe makes it very easy for us to train a multilayer network. We can specify all the parameters in a prototxt file, create a training database, and just train the network. Let’s go ahead and see how to do that, shall we?  

Training a deep neural network

We are now ready to create our own model. Make sure you have some labeled training data. If you don’t have it, you can any of the datasets listed here. Before we start training a network, we need the following:

  • Model definition: A prototxt file containing the model definition (like the one we had earlier)
  • Learning algorithm: A prototxt file describing the parameters for the stochastic gradient algorithm. This is called the solver file.
  • Mean image: We need to compute the mean image of the training dataset
  • Training data: A text file containing the training data images in a specific format
  • Testing data: A text file containing the test data images in a specific format

For now, you can take the files named “train_val.prototxt” and “solver.prototxt” located in “/path/to/caffe/models/bvlc_reference_caffenet”. Rename them to “my_train_val.prototxt” and “my_solver.prototxt” so that it’s clearer for you. Open up “my_solver.prototxt” in a text editor and change the params to make it look like this:

net: "my_train_val.prototxt"
test_iter: 1000
test_interval: 1000
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 10000
display: 20
max_iter: 50000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "models/mymodel/train"
solver_mode: GPU

Make sure to create the folder “models/mymodel” in the current directory. The solver file “my_solver.prototxt” looks for the file “my_train_val.prototxt” in the same directory and the path “models/mymodel” should be relative to this directory.

We need to create two more files — train_files.txt and test_files.txt. These files should contain images and the coresponding labels in the following format:

/path/to/folder/image1.jpg 0
/path/to/folder/image2.jpg 3
/path/to/folder/image3.jpg 1
/path/to/folder/image4.jpg 2
/path/to/folder/image5.jpg 1
...
...
/path/to/folder/imageN.jpg N-1

The above file contains images divided into N classes (0-indexed). It’s important that the images are shuffled in both the text files. We want images from random classes to appear in a sequence.

Computing image mean

We need to compute image mean for our dataset in order to use it during training. This is an architectural specification that was derived from experimentation by researchers. This mean image will be subtracted from each image to boost the performance of the network. Caffe provides a way to compute the image mean directly. We need to generate the lmdb database for our training images so that Caffe can use it to generate the mean image. Run the following command to generate the lmdb database:

$ GLOG_logtostderr=1 /path/to/caffe/build/tools/convert_imageset --resize_height=256 --resize_width=256 --shuffle / /path/to/train.txt /path/to/train_lmdb

We are now ready to compute the mean image. Run the following command:

$ /path/to/caffe/build/tools/compute_image_mean /path/to/train_lmdb /path/to/mean_image.binaryproto

Training the model

Open up my_train_val.prototxt and change the first few lines as given below:

layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_file: "my_image_mean.binaryproto"
  }
  image_data_param {
    source: "train_files.txt"
    batch_size: 50
    new_height: 256
    new_width: 256
  }
}
layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_file: "my_image_mean.binaryproto"
  }
  image_data_param {
    source: "test_files.txt"
    batch_size: 50
    new_height: 256
    new_width: 256
  }
}

In the “fc8” layer, change the “num_output” parameter to the number of classes you have. We are now ready to train:

$ /full/path/to/caffe/build/tools/caffe train --solver /full/path/to/my_solver.prototxt

If everything goes well, it will start printing the log messages on the terminal. You can look at the error values as they start converging with the number of iterations. Once the error is low enough, say 1e-6, you can stop the training. If it reaches the maximum number of iterations as specified in the solver file, it will stop by itself.

——————————————————————————————————————

9 thoughts on “Deep Learning With Caffe In Python – Part III: Training A CNN

  1. Is there a way to calculate mean directly from caffe if we are not using the lmdb instead if we are just using the text files containing the image names and their labels.

    1. Caffe needs to access the image data in a particular format in order to compute the mean. That’s the reason we need the lmdb database! If you don’t want to use lmdb, you can write your own OpenCV code to compute mean. It should be pretty straightforward.

  2. Your tutorials are very helpful to a beginner like me. I need some help with multi-label classification of Images using Caffe where labels are a 1 dimensional vector of length 9. Out of LMDB/IMAGEDATA/HDF5 data layer only HDF5 supports multilabel classification. But I am facing trouble while training the caffe for the hdf5 files. Any help would be great. Thanks

  3. These tutorials are great ! Cannot commend you enough for the brilliant effort. I am a beginner and found limited resources on how to understand and write prototxt files(my_train_val.prototxt, etc). It would be helpful if you could cite some references. Thanks !

  4. I tried running the command to compute the image mean /path/to/caffe/build/tools/compute_image_mean /path/to/train_lmdb /path/to/mean_image.binaryproto having set all paramaters correctly I got the following error:

    I1108 14:34:03.058652 28733 db_lmdb.cpp:35] Opened lmdb /home/doc/caffe/training1/mytrain_lmdb
    Segmentation fault (core dumped)

    Can you suggest what I did wrong?

  5. Hi there,
    I download the mnist dataset but they are like this:
    t10k-images-idx3-ubyte.gz
    t10k-labels-idx1-ubyte.gz
    train-images-idx3-ubyte.gz
    train-labels-idx1-ubyte.gz
    then I upzip them and I got like this:
    t10k-images.idx3-ubyte
    t10k-labels.idx1-ubyte
    train-images.idx3-ubyte
    train-labels.idx1-ubyte
    I don’t know how to make train_files.txt and test_files.txt as you mentioned with the format:
    /path/to/folder/image1.jpg 0
    /path/to/folder/image2.jpg 3
    /path/to/folder/image3.jpg 1
    /path/to/folder/image4.jpg 2
    /path/to/folder/image5.jpg 1


    /path/to/folder/imageN.jpg N-1

    Please tell me how,
    Thanks and regards,
    Alice

Leave a comment