TensorFlow Object Detection Model Training
This is a summary of this nice tutorial.
Annotating images and serializing the dataset
All the scripts mentioned in this section receive arguments from the
command line and have help messages through the
-h/--help flags. Also
check the README from the repo they come from to get more details, if
- Install labelImg. This is a Python package, which means you can install it via pip, but the one from GitHub is better. It saves annotations in the PASCAL VOC format.
- Annotate your dataset using labelImg.
- Use this script to convert the XML files generated by labelImg into a single CSV file.
- Use this script
to separate the CSV file into two, one with training examples and
one with evaluation examples. Let's call them
eval.csv. Images will be selected randomly and there are options to stratify examples by class, making sure that objects from all classes are present in both datasets. The usual proportions are 75% to 80% of the annotated objects used for training and the rest for the evaluation dataset.
- Create a “label map” for your classes. You can check some examples to understand what they look like. You can also generate one from your original CSV file with this script.
- Use this script
to convert the two CSV files (
eval.csv) into two TFRecord files (eg.
eval.record), a serialized data format that TensorFlow is most familiar with. You'll need the label map from the previous for this.
Choosing a neural network and preparing the training pipeline…
- Download one of the neural network models provided in this page. The ones trained in the COCO dataset are the best ones, since they were also trained on objects.
- Provide a training pipeline, which is a
configfile that usually comes in the
tar.gzfile downloaded in the previous step. If they don’t come in the
tar.gz, they can be found here. You can find a tutorial on how to create your own here.
- The pipeline config file has some fields that must be adjusted
before training is started. Its header describes which ones.
Usually, they are the fields that point to the label map, the
training and evaluation directories and the neural network
checkpoint. In case you downloaded one of the models provided in
you should untar the
tar.gzfile and point the checkpoint path inside the pipeline config file to the "untarred" directory of the model (see this answer for help).
- You should also check the number of classes. COCO has 90 classes, but your data set may have more or less.
- There are additional parameters that may affect how much RAM is consumed by the training process, as well as the quality of the training. Things like the batch size or how many batches TensorFlow can prefetch and keep in memory may considerably increase the amount of RAM necessary, but I won't go over those here as there is too much trial and error in adjusting those.
- The pipeline config file has some fields that must be adjusted before training is started. Its header describes which ones. Usually, they are the fields that point to the label map, the training and evaluation directories and the neural network checkpoint. In case you downloaded one of the models provided in this page, you should untar the
Training the network
Train the model. This is how you do it locally. Optional: in order to check training progress, TensorBoard can be started pointing its
Export the network, like this.
Use the exported
.pbin your object detector.
In the data augmentation section of the training pipeline, some options can be added or removed to try and make the training better. Some of the options are listed here.