[70] SSD: Single Shot MultiBox Detector

TL;DR

task : real-time object detection
problem : Unlike other Faster RCNNs that do region proposal + pooling, we want to predict box and cls at once.
idea : Similar to anchor in Faster RCNN, get the relative localization (dx, dy, dw, dh) for default boxes with different size / ratio for each feature map and get the confidence for all classes. Do this for multi-scale features.
architecture : Attach multi-scale feature maps to VGG-16 (so that the feature maps become progressively smaller). For each feature map, attach a head that predicts (num of classes + 4(=coordinates)) * num of default boxes with a head that predicts.
objective : Weighted sum of cross entropy loss and localization loss for class confidence. boxes candidates and gt match all with jaccard > 0.5.
baseline : Faster RCNN, YOLO
data : PASCAL VOC, COCO, ILSVRC
result : Faster RCNN, faster inference and better performance than YOLO
contribution : region proposal all at once, without any !

hard negative mining : negative the one with the highest confidence loss in each default box. positive : negative so that negative = 1: 3(=num of default boxes).
augmentation : random crop so that the jaccard of objects is above a threshold. enter augmentation for small objects.
faster RCNN : anchor and default boxes are the same concept~! It seems like the motivation of faster RCNN is different because it doesn’t use multi-scale feature. There is also a two-stage, one-stage difference.