[80] Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

TL;DR

task : dense object detection with localization score
Problem :** Existing localization score extraction works 1) separately learn the IoU score branches when training and combine them when inferring, resulting in a gap between train and infer, 2) impose localization quality only on positives, resulting in very high IoU scores for negative samples, and 3) assume the bbox distribution is Dirac-Delta or gaussian, which is too simple.
idea : Combine category and IoU score when learning to give a smooth target to eliminate the learning-inference gap, and also learn the distribution over bboxes to eliminate the strong constraint of the distribution.
architecture : ResNet with FPN + ???
objective : 1) multiply the focal loss by $|y-\sigma|^\beta$, the distance term from the target, instead of $(1-p_t)^\gamma$ and 2) also reflect the value for the discrete distribution learned => Generalized Focal Loss
baseline : w/o quality branch, IoU branch, centerness-guided, IoU guided
data : COCO
result : better performance than no quality branch, and most of the rest are better than IoU-branch with a few exceptions.
contribution :
Limitations or things I don’t understand : I don’t really understand the Distribution Focal Loss part and I’m not sure what the architecture you experimented with here is. Is it just ResNet + FPN plus bbox prediction for every pixel? What is ATSS?