Is there any way to read in data for object detection in TensorFlow (e.g. similar to caffe's WindowDataLayer)? I've tried looking around for examples that do this, but haven't found any.
3 Answers
The standard data format that TensorFlow uses is the Example protocol buffer, which has a generic notion of "Feature" that should support Caffe-style WindowData. The documentation has some information on this format, and the source code includes an example application for converting image data (the simple MNIST format) to this format, for use with the standard input pipeline.
If you follow these steps, you would most likely store the image as a "bytes" feature, and add dense integer features, corresponding to the coordinates of the windows and the labels.
Comments
I have been using TensorFlow for object detection over the last few weeks and have released some of my code as TensorBox. The input format is a text file in IDL format (see here, for example) with a list of image names and a list of the bounding boxes in each image. You can switch out these input files to train and test on your own images.
Comments
Take a look at the following file in smallcorgi's github repository that implements Faster RCNN architecture in tensorflow github repository. That file is an example on how to read PASCAL VOC formatted xml files with bounding box annotations.