Object Detection using Fast R CNN

By for December 6, 2016

Report Abuse
Fast R-CNN is an object detection algorithm proposed by Ross Girshick in 2015. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs a region of interest pooling scheme that allows to reuse the computations from the convolutional layers.
# **Table of Contents** * Summary * Setup * Run the toy example * Run Pascal VOC * Train Cognitive Toolkit Fast R-CNN on your own data * Technical details * Algorithm details # **Summary** ![enter image description here][1] ![enter image description here][16] The above are examples images and object annotations for the grocery data set (left) and the Pascal VOC data set (right) used in this tutorial. **Fast R-CNN** is an object detection algorithm proposed by **Ross Girshick** in 2015. The paper is accepted to ICCV 2015, and archived at https://arxiv.org/abs/1504.08083. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs a *region of interest pooling* scheme that allows to reuse the computations from the convolutional layers. The following are the main resources for Cognitive Toolkit Fast R-CNN: * **Recipe:** [Brain Script config file][2] (see fastrcnn.cntk). * **pre-trained models:** Download pre-trained Fast R-CNN models for the [grocery][3] data or the [Pascal VOC][4] data set. * **data:** Example data(food items in a fridge) on the section titled **Example data and baseline model** and Pascal VOC data on the section titled **Run Pascal VOC**. * **How to run:** Follow the description below. Additional material: a detailed tutorial for object detection using Cognitive Toolkit Fast R-CNN (including optional SVM training and publishing the trained model as a Rest API) can be found [here][7]. # **Setup** Currently, Cognitive Toolkit only supports *Python 3.4* . We recommend to install anaconda python ( http://continuum.io/downloads ) and create a python 3.4 environment using conda create --name cntk python=3.4.3 numpy scipy activate cntk To run the code in this example, you need to install a few additional packages. Under Python 3.4 (64bit version assumed), go to the FastRCNN folder and run: pip install -r requirements.txt Known issue: to install scikit-learn you might have to run `conda install scikit-learn`. You will further need Scikit-Image and OpenCV to run these examples (and possibly numpy and scipy if your Python 3.4 package does not come with them). You need to download the corresponding wheel packages and install them manually. For Windows users, visit http://www.lfd.uci.edu/~gohlke/pythonlibs/, and download: * scikit_image-0.12.3-cp34-cp34m-win_amd64.whl * opencv_python-3.1.0-cp34-cp34m-win_amd64.whl Once you download the respective wheel binaries, install them with: pip install your_download_folder/scikit_image-0.12.3-cp34-cp34m-win_amd64.whl pip install your_download_folder/opencv_python-3.1.0-cp34-cp34m-win_amd64.whl This tutorial code assumes you are using 64bit version of Python 3.4, as the Fast R-CNN DLL files under [utils_win64][8] are prebuilt for this version. If your task requires the use of a different Python version, please recompile these DLL files yourself in the correct environment. The tutorial further assumes that the folder where cntk.exe resides is in your PATH environment variable. # Example data and baseline model We use a pre-trained AlexNet model as the basis for Fast-R-CNN training. Both the example dataset and the pre-trained AlexNet model can be downloaded by running the following Python command from the FastRCNN folder: python install_fastrcnn.py # **Run the toy example** In the toy example we train a Cognitive Toolkit Fast R-CNN model to detect grocery items in a refrigirator. All required scripts are in **<cntkroot>/Examples/Image/Detection/FastRCNN**. # Quick guide To run the toy example, make sure that in `PARAMETERS.py` the **datasetName** is set to `"grocery"`. * Run `A1_GenerateInputROIs.py` to generate the input ROIs for training and testing. * Run `A2_RunCntk.py` to train a Fast R-CNN model and compute test results. * Alternatively you can download the pre-trained Cognitive Toolkit Fast R-CNN model for the grocery example [here][8]. (See Details in the section title **Using a pre-trained model**) * Run `A3_ParseAndEvaluateOutput.py` to compute the mAP (mean average precision, see section titled **mAP (mean Average Precision)**) of the trained model. The ouput from script A3 should contain the following: Evaluating detections AP for avocado = 1.0000 AP for orange = 0.2500 AP for butter = 1.0000 AP for champagne = 1.0000 AP for eggBox = 0.7500 AP for gerkin = 1.0000 AP for joghurt = 1.0000 AP for ketchup = 1.0000 AP for orangeJuice = 1.0000 AP for onion = 1.0000 AP for pepper = 0.7600 AP for tomato = 0.6400 AP for water = 0.5000 AP for milk = 1.0000 AP for tabasco = 1.0000 AP for mustard = 1.0000 Mean AP = 0.8688 DONE. To visualize the bounding boxes and predicted labels you can run `B3_VisualizeOutputROIs.py` (click on the images to enlarge): ![enter image description here][12] ![enter image description here][13] ![enter image description here][14] ![enter image description here][15] ![enter image description here][16] #Step details **A1:** The script first `A1_GenerateInputROIs.py` generates ROI candidates for each image using selective search (see section titled **Selective Search**. It then stores them in a [Cognitive Toolkit Text Format][18] as input for `cntk.exe`. Additionally the required Cognitive Toolkit input files for the images and the ground truth labels are generated. The script generates the following folders and files under the **FastRCNN** folder: * **proc** - root folder for generated content. * **grocery\_2000** - contains all generated folders and files for the **grocery** example using **2000** ROIs. If you run again with a different number of ROIs the folder name will change correspondingly. * **rois** - contains the raw ROI coordinates for each image stored in text files. * **cntkFiles** - contains the formatted Cognitive Toolkit input files for images (**train.txt** and **test.txt**), ROI coordinates (**xx.rois.txt**) and ROI labels (**xx.roilabels.txt**) for **train** and **test**. (Format details are provided in the section titled **Cognitive Toolkit input file format**.) All parameters are contained in *PARAMETERS.py*, for example *cntk_nrRois = 2000* to set the number of ROIs used for training and testing. We describe parameters in the section **Parameters** below. **A2:** The script `A2_RunCntk.py` runs Cognitive Toolkit using cntk.exe and a brain script config file (see section **Cognitive Toolkit configuration**). A script that uses the new Cognitive Toolkit Python API for Fast R-CNN training will be added soon. The trained model is stored in the folder `cntkFiles/Output` of the corresponding `proc` sub-folder. The trained model is tested seperately on both the training set and the test set. During testing for each image and each corresponding ROI a label is predicted and stored in the files `test.z` and `train.z` in the `cntkFiles` folder. **A3:** The evaluation step parses the Cognitive Toolkit output and computes the mAP (see section **mAP (mean Average Precision)**) comparing the predicted results with the ground truth annoations. Non maximum suppression (see section for **Non Maximum Suppression**) is used to merge overlapping ROIs. You can set the threshold for non maximum suppresion in `PARAMETERS.py` (See section **Parameters**). # Using a pre-trained model Download links for pre-trained models are provided at the top of this page. Store the model in the **cntkFiles/Output** folder under the corresponding proc sub-folder, for example **proc/grocery_2000/cntkFiles/Output** for the toy example. **Note:** if you are using a pre-trained model you still need to run step A2 to compute the predicted labels, i.e. Cognitive Toolkit will skip the training and only run the testing. # Additional scripts There are three optional scripts you can run to visualize and analyze the data: * `B1_VisualizeInputROIs.py` visualizes the candidate input ROIs. * `B2_EvaluateInputROIs.py` computes the recall of the ground truth ROIs with respect to the candidate ROIs. * `B3_VisualizeOutputROIs.py` visualize the bounding boxes and predicted labels. # **Run Pascal VOC** The [Pascal VOC][25] (PASCAL Visual Object Classes) data is a well known set of standardised images for object class recognition. Training or testing Cognitive Toolkit Fast R-CNN on the Pascal VOC data requires a GPU with at least 4GB of RAM. Alternatively you can run using the CPU, which will however take some time. In this case we strongly recommend to download the pre-trained model (see section **Using a pre-trained model**). # Getting the Pascal VOC data You need the 2007 (trainval and test) and 2012 (trainval) data as well as the precomputed ROIs used in the original paper. You need to follow the folder structure described below. The scripts assume that the Pascal data resides in **\<cntkroot\>/Examples/Image/DataSets/Pascal** . If you are using a different folder please set `pascalDataDir` in `PARAMETERS.py` correspondingly. * Download and unpack the 2012 trainval data to **DataSets/Pascal/VOCdevkit2012** * Website: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/ * Devkit: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar * Download and unpack the 2007 trainval data to **DataSets/Pascal/VOCdevkit2007** * Website: http://host.robots.ox.ac.uk/pascal/VOC/voc2007/ * Devkit: http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar * Download and unpack the 2007 test data into the same folder **DataSets/Pascal/VOCdevkit2007** * http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar * Download and unpack the precomputed ROIs to **DataSets/Pascal/selective_search_data** * http://www.cs.berkeley.edu/~rbg/fast-rcnn-data/selective_search_data.tgz The **VOCdevkit2007** folder should look like this (similar for 2012): VOCdevkit2007/VOC2007 VOCdevkit2007/VOC2007/Annotations VOCdevkit2007/VOC2007/ImageSets VOCdevkit2007/VOC2007/JPEGImages # Running Cognitive Toolkit on Pascal VOC To run on the Pascal VOC data make sure that in `PARAMETERS.py` the **datasetName** is set to `"pascal"`. * Run `A1_GenerateInputROIs.py` to generate the CTNK formatted input files for training and testing from the downloaded ROI data. * Run `A2_RunCntk.py` to train a Fast R-CNN model and compute test results. * If you downloaded the pre-trained model (section **Using a pre-trained model**) you still need to run step A2 to compute the predicted labels. To decrease the required time you can skip computing the predictions for the training data by setting `command = Train:WriteTest` (i.e. reomving `WriteTrain`) in the `fastrcnn.cntk` file. * Run `A3_ParseAndEvaluateOutput.py` to compute the mAP (see section **mAP (mean Average Precision)**) of the trained model. * Please note that this is work in progress and the results are preliminary as we are training new baseline models. * Please make sure to have the latest version from Cognitive Toolkit master for the files [fastRCNN/pascal_voc.py][29] and [fastRCNN/voc_eval.py][30] to avoid encoding errors. # **Train on your own data** To train a Cognitive Toolkit Fast R-CNN model on your own data set we provide two scripts to annotate rectangular regions on images and assign labels to these regions. The scripts will store the annotations in the correct format as required by the first step of running Fast R-CNN (`A1_GenerateInputROIs.py`). First, store your images in the following folder structure * `<your_image_folder>/negative` - images used for training that don't contain any objects * `\<your_image_folder\>/positive` - images used for training that do contain objects * `\<your_image_folder\>/testImages` - images used for testing that do contain objects For the negative images you do not need to create any annotations. For the other two folders use the proveded scripts: * Run `C1_DrawBboxesOnImages.py` to draw bounding boxes on the images. * In the script set `imgDir = <your_image_folder>` before running. * Add annotations using the mouse cursor. Once all objects in an image are annotated, pressing key 'n' writes the .bboxes.txt file and then proceeds to the next image, 'u' undoes (i.e. removes) the last rectangle, and 'q' quits the annotation tool. * Run `C2_AssignLabelsToBboxes.py` to assign labels to the bounding boxes. * In the script set `imgDir = <your_image_folder>` before running... * ... and adapt the classes in the script to reflect your object categories, for example `classes = ("dog", "cat", "octopus")`. * The script loads these manually annotated rectangles for each image, displays them one-by-one, and asks the user to provide the object class by clicking on the respective button to the left of the window. Ground truth annotations marked as either "undecided" or "exclude" are fully excluded from further processing. Before running Cognitive Toolkit Fast R-CNN using scripts A1-A3 you need to add your data set to `PARAMETERS.py`: * Set the data set name to a new name `datasetName = "myOwnImages"` * For the `# project-specific parameters` add a new for `myOwnImages`. You can start by copying the section from **grocery** * Adapt the classes to reflect your object categories. Following the above example this would look like `classes = ('__background__', 'dog', 'cat', 'octopus')`. * Set `imgDir = <your_image_folder>`. * Optionally you can adjust more parametes, e.g. for ROI generation and pruning (see **Parameters** section). Ready to train on your own data! (Use the same steps as in the **Quick Guide** section as for the toy example.) # **Technical details** **Parameters** The main parameters in `PARAMETERS.py` are * `datasetName` - which data set to use * `cntk_numRois` - how many ROIs to use for training and testing * `nmsThreshold` - Non Maximum suppression threshold (in range [0,1]). The lower the more ROIs will be combined. It used for both evalutation and visualization. All parameters for ROI generation, such as minimum and maximum width and heigth etc., are described in `PARAMETERS.py` next to the parameters themselves. They are all set to a default value which is reasonable. You can overwrite them in the `# project-specific parameter` section corresponding to the data set you are using. # Cognitive Toolkit configuration The Cognitive Toolkit brain script configuration file that is used to train and test Fast R-CNN is [fastrcnn.cntk][33]. The part that is constructing the network is the `BrainScriptNetworkBuilder` section in the `Train` command: BrainScriptNetworkBuilder = { network = BS.Network.Load ("../../../../../pre-trainedModels/AlexNet.model") convLayers = BS.Network.CloneFunction(network.features, network.conv5_y, parameters = "constant") fcLayers = BS.Network.CloneFunction(network.pool3, network.h2_d) model (features, rois) = { featNorm = features - 114 convOut = convLayers (featNorm) roiOut = ROIPooling (convOut, rois, (6:6)) fcOut = fcLayers (roiOut) W = ParameterTensor{($NumLabels$:4096), init="glorotUniform"} b = ParameterTensor{$NumLabels$, init = 'zero'} z = W * fcOut + b }.z imageShape = $ImageH$:$ImageW$:$ImageC$ # 1000:1000:3 labelShape = $NumLabels$:$NumTrainROIs$ # 21:64 ROIShape = 4:$NumTrainROIs$ # 4:64 features = Input {imageShape} roiLabels = Input {labelShape} rois = Input {ROIShape} z = model (features, rois) ce = CrossEntropyWithSoftmax(roiLabels, z, axis = 1) errs = ClassificationError(roiLabels, z, axis = 1) featureNodes = (features:rois) labelNodes = (roiLabels) criterionNodes = (ce) evaluationNodes = (errs) outputNodes = (z) } In the first line the pre-trained AlexNet is loaded as the base model. Next two parts of the network are cloned: `convLayers` contains the convolutional layers with constant weights, i.e. they are not trained further. `fcLayers` contains the fully connected layers with the pre-trained weights, which will be trained further. The node names `network.features`, `network.conv5_y` etc. can be derived from looking at the log output of the cntk.exe call (contained in the log output of the `A2_RunCntk.py` script). The model definition `( model (features, rois) = ... )` first normalizes the features by subtracting 114 for each channel and pixel. Then the normalized features are pushed through the `convLayers` followed by the `ROIPooling` and finally the `fcLayers`. The output shape (width:height) of the ROI pooling layer is set to **(6:6)** since this is the shape and size that the pre-trained `fcLayers` from the AlexNet model expect. The output of the `fcLayers` is fed into a dense layer that predicts one value per label (`NumLabels`) for each ROI The following six lines define the input: an image of size 1000 x 1000 x 3 (`$ImageH$:$ImageW$:$ImageC$`), ground truth labels for each ROI (`$NumLabels$:$NumTrainROIs$`) and four coordinates per ROI (`4:$NumTrainROIs$`) coresponding to (x, y, w, h), all relative with respect to the full width and height of the image. `z = model (features, rois)` feeds the input images and rois into the defined network model and assigns the output to `z` . Both the criterion (`CrossEntropyWithSoftmax`) and the error (`ClassificationError`) are specified with `axis = 1` to account for the prediction error per ROI. The reader section of the Cognitive Toolkit configuration is listed below. It uses three deserializers: * `ImageDeserializer` to read the image data. It picks up the image file names from **train.txt**, scales the image to the desired width and height while preserving the aspect ratio (padding empty areas with **114**) and transposes the tensor to have the correct input shape. * One `CNTKTextFormatDeserializer` to read the ROI coordinates from **train.rois.txt**. * A second `CNTKTextFormatDeserializer` to read the ROI labels from **train.roislabels.txt**. The input file formats are described in the next section. reader = { randomize = false verbosity = 2 deserializers = ({ type = "ImageDeserializer" ; module = "ImageReader" file = train.txt input = { features = { transforms = ( { type = "Scale" ; width = $ImageW$ ; height = $ImageW$ ; channels = $ImageC$ ; scaleMode = "pad" ; padValue = 114 }: { type = "Transpose" } )} ignored = {labelDim = 1000} } }:{ type = "CNTKTextFormatDeserializer" ; module = "CNTKTextFormatReader" file = train.rois.txt input = { rois = { dim = $TrainROIDim$ ; format = "dense" } } }:{ type = "CNTKTextFormatDeserializer" ; module = "CNTKTextFormatReader" file = train.roilabels.txt input = { roiLabels = { dim = $TrainROILabelDim$ ; format = "dense" } } }) } # Cognitive Toolkit input file format There are three input files for Cognitive Toolkit Fast R-CNN corresponding to the three deserializers described above: 1) `train.txt` contains in each line first a sequence number, then an image filenames and finally a `0` (which is currently still needed for legacy reasons of the ImageReader). 0 image_01.jpg 0 1 image_02.jpg 0 ... 2) `train.rois.txt` ([Cognitive Toolkit text format][34]) contains in each line first a sequence number, then the `|rois` identifier followed by a sequence of numbers. These are groups of four numbers corresponding to (x, y, w, h) of an ROI, all relative with respect to the full width and height of the image. There is a total of 4 * number-of-rois numbers per line. 0 |rois 0.2185 0.0 0.165 0.29 ... 3) `train.roilabels.txt` ([Cognitive Toolkit text format][34]) contains in each line first a sequence number, then the `|roiLabels` identifier followed by a sequence of numbers. These are groups of number-of-labels numbers (either zero or one) per ROI encoding the ground truth class in a one-hot representation. There is a total of number-of-labels * number-of-rois numbers per line. 0 |roiLabels 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... # **Algorithm details** **Fast R-CNN** R-CNNs for Object Detection were first presented in 2014 by [Ross Girshick et al.][35], and were shown to outperform previous state-of-the-art approaches on one of the major object recognition challenges in the field: [Pascal VOC][36]. Since then, two follow-up papers were published which contain significant speed improvements: [Fast R-CNN][37] and [Faster R-CNN][38]. The basic idea of R-CNN is to take a deep Neural Network which was originally trained for image classification using millions of annotated images and modify it for the purpose of object detection. The basic idea from the first R-CNN paper is illustrated in the Figure below (taken from the paper): (1) Given an input image, (2) in a first step, a large number region proposals are generated. (3) These region proposals, or Regions-of-Interests (ROIs), are then each independently sent through the network which outputs a vector of e.g. 4096 floating point values for each ROI. Finally, (4) a classifier is learned which takes the 4096 float ROI representation as input and outputs a label and confidence to each ROI. ![enter image description here][39] While this approach works well in terms of accuracy, it is very costly to compute since the Neural Network has to be evaluated for each ROI. Fast R-CNN addresses this drawback by only evaluating most of the network (to be specific: the convolution layers) a single time per image. According to the authors, this leads to a 213 times speed-up during testing and a 9x speed-up during training without loss of accuracy. This is achieved by using an ROI pooling layer which projects the ROI onto the convolutional feature map and performs max pooling to generate the desired output size that the following layer is expecting. In the AlexNet example used in this tutorial the ROI pooling layer is put between the last convolutional layer and the first fully connected layer. The original Caffe implementation used in the R-CNN papers can be found at github: [RCNN][41], [Fast R-CNN][42], and [Faster R-CNN][43]. This tutorial uses some of the code from these repositories, notably (but not exclusively) for svm training and model evaluation. # SVM vs NN training Patrick Buehler provides instructions on how to train an SVM on the Cognitive Toolkit Fast R-CNN output (using the 4096 features from the last fully connected layer) as well as a discussion on pros and cons [here][44]. # Selective Search [Selective Search][45] is a method for finding a large set of possible object locations in an image, independent of the class of the actual object. It works by clustering image pixels into segments, and then performing hierarchical clustering to combine segments from the same object into object proposals. ![enter image description here][46] ![enter image description here][47] ![enter image description here][48] To complement the detected ROIs from Selective Search, we add ROIs that uniform cover the image at different scales and aspect ratios. The image on the left shows an example output of Selective Search, where each possible object location is visualized by a green rectangle. ROIs that are too small, too big, etc. are discarded (middle) and finally ROIs that uniformly cover the image are added (right). These rectangles are then used as Regions-of-Interests (ROIs) in the R-CNN pipeline. The goal of ROI generation is to find a small set of ROIs which however tightly cover as many objects in the image as possible. This computation has to be sufficiently quick, while at the same time finding object locations at different scales and aspect ratios. Selective Search was shown to perform well for this task, with good accuracy to speed trade-offs. # NMS (Non Maximum Suppression) Object detection methods often output multiple detections which fully or partly cover the same object in an image. These ROIs need to be merged to be able to count objects and obtain their exact locations in the image. This is traditionally done using a technique called Non Maximum Suppression (NMS). The version of NMS we use (and which was also used in the R-CNN publications) does not merge ROIs but instead tries to identify which ROIs best cover the real locations of an object and discards all other ROIs. This is implemented by iteratively selecting the ROI with highest confidence and removing all other ROIs which significantly overlap this ROI and are classified to be of the same class. The threshold for the overlap can be set in `PARAMETERS.py` (See **Parameters** section). Detection results before (left) and after (right) Non Maximum Suppression: ![enter image description here][50] ![enter image description here][51] # mAP (mean Average Precision) Once trained, the quality of the model can be measured using different criteria, such as precision, recall, accuracy, area-under-curve, etc. A common metric which is used for the Pascal VOC object recognition challenge is to measure the Average Precision (AP) for each class. The following description of Average Precision is taken from [Everingham et. al.][52] The mean Average Precision (mAP) is computed by taking the average over the APs of all classes. *For a given task and class, the precision/recall curve is computed from a method’s ranked output. Recall is defined as the proportion of all positive examples ranked above a given rank. Precision is the proportion of all examples above that rank which are from the positive class. The AP summarises the shape of the precision/recall curve, and is defined as the mean precision at a set of eleven equally spaced recall levels [0,0.1, . . . ,1]:* ![enter image description here][53] *The precision at each recall level r is interpolated by taking the maximum precision measured for a method for which the corresponding recall exceeds r:* ![enter image description here][54] *where p(~r) is the measured precision at recall ~r.The intention in interpolating the precision/recall curve in this way is to reduce the impact of the “wiggles” in the precision/recall curve, caused by small variations in the ranking of examples. It should be noted that to obtain a high score, a method must have precision at all levels of recall – this penalises methods which retrieve only a subset of examples with high precision (e.g. side views of cars).* [1]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image1.jpg [2]: https://github.com/Microsoft/CNTK/blob/master/Examples/Image/Detection/FastRCNN [3]: https://www.cntk.ai/Models/FRCN_Grocery/Fast-RCNN.model [4]: https://www.cntk.ai/Models/FRCN_Pascal/Fast-RCNN.model [5]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#example-data-and-baseline-model [6]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#run-pascal-voc [7]: https://github.com/Azure/ObjectDetectionUsingCntk [8]: https://github.com/Microsoft/CNTK/tree/master/Examples/Image/Detection/FastRCNN/fastRCNN/utils3_win64 [9]: https://www.cntk.ai/Models/FRCN_Grocery/Fast-RCNN.model [10]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#using-a-pre-trained-model [11]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#map-mean-average-precision [12]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image2a.jpg [13]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image2.jpg [14]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image2c.jpg [15]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image2d.jpg [16]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image2e.jpg [17]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#selective-search [18]: https://github.com/Microsoft/CNTK/wiki/CNTKTextFormat-Reader [19]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#cntk-input-file-format [20]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#parameters [21]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#cntk-configuration [22]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#map-mean-average-precision [23]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#NMS-Non-Maximum-Suppression [24]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#parameters [25]: http://host.robots.ox.ac.uk/pascal/VOC/ [26]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#using-a-pre-trained-model [27]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#using-a-pre-trained-model [28]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#map-mean-average-precision [29]: https://github.com/Microsoft/CNTK/blob/master/Examples/Image/Detection/FastRCNN/fastRCNN/pascal_voc.py [30]: https://github.com/Microsoft/CNTK/blob/master/Examples/Image/Detection/FastRCNN/fastRCNN/voc_eval.py [31]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#parameters [32]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#quick-guide [33]: https://github.com/Microsoft/CNTK/blob/master/Examples/Image/Detection/FastRCNN/fastrcnn.cntk [34]: https://github.com/Microsoft/CNTK/wiki/CNTKTextFormat-Reader [35]: http://arxiv.org/abs/1311.2524 [36]: http://host.robots.ox.ac.uk/pascal/VOC/ [37]: https://arxiv.org/pdf/1504.08083v2.pdf [38]: https://arxiv.org/abs/1506.01497 [39]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image3.jpg [40]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#CNTK-configuration [41]: https://github.com/rbgirshick/rcnn [42]: https://github.com/rbgirshick/fast-rcnn [43]: https://github.com/rbgirshick/py-faster-rcnn [44]: https://github.com/Azure/ObjectDetectionUsingCntk [45]: http://koen.me/research/pub/uijlings-ijcv2013-draft.pdf [46]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image4.jpg [47]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image4b.jpg [48]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image4c.jpg [49]: https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN#parameters [50]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image5.jpg [51]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image5b.jpg [52]: http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf [53]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image6.jpg [54]: https://az712634.vo.msecnd.net/tutorials/Fast-R-CNN/FastRCNN_Image7.jpg