
TL;DR
- task : dense object detection with localization score
- Problem :** Existing localization score extraction works 1) separately learn the IoU score branches when training and combine them when inferring, resulting in a gap between train and infer, 2) impose localization quality only on positives, resulting in very high IoU scores for negative samples, and 3) assume the bbox distribution is Dirac-Delta or gaussian, which is too simple.
- idea : Combine category and IoU score when learning to give a smooth target to eliminate the learning-inference gap, and also learn the distribution over bboxes to eliminate the strong constraint of the distribution.
- architecture : ResNet with FPN + ???
- objective : 1) multiply the focal loss by $|y-\sigma|^\beta$, the distance term from the target, instead of $(1-p_t)^\gamma$ and 2) also reflect the value for the discrete distribution learned => Generalized Focal Loss
- baseline : w/o quality branch, IoU branch, centerness-guided, IoU guided
- data : COCO
- result : better performance than no quality branch, and most of the rest are better than IoU-branch with a few exceptions.
- contribution :
- Limitations or things I don’t understand : I don’t really understand the Distribution Focal Loss part and I’m not sure what the architecture you experimented with here is. Is it just ResNet + FPN plus bbox prediction for every pixel? What is ATSS?
Details
Problems with traditional methods


Key Ideas in Generalized Focal Loss

focal loss

quality focal loss

distribution focal loss


- generalized focal loss

Result
