image

paper

TL;DR

  • task : dense object detection with localization score
  • Problem :** Existing localization score extraction works 1) separately learn the IoU score branches when training and combine them when inferring, resulting in a gap between train and infer, 2) impose localization quality only on positives, resulting in very high IoU scores for negative samples, and 3) assume the bbox distribution is Dirac-Delta or gaussian, which is too simple.
  • idea : Combine category and IoU score when learning to give a smooth target to eliminate the learning-inference gap, and also learn the distribution over bboxes to eliminate the strong constraint of the distribution.
  • architecture : ResNet with FPN + ???
  • objective : 1) multiply the focal loss by $|y-\sigma|^\beta$, the distance term from the target, instead of $(1-p_t)^\gamma$ and 2) also reflect the value for the discrete distribution learned => Generalized Focal Loss
  • baseline : w/o quality branch, IoU branch, centerness-guided, IoU guided
  • data : COCO
  • result : better performance than no quality branch, and most of the rest are better than IoU-branch with a few exceptions.
  • contribution :
  • Limitations or things I don’t understand : I don’t really understand the Distribution Focal Loss part and I’m not sure what the architecture you experimented with here is. Is it just ResNet + FPN plus bbox prediction for every pixel? What is ATSS?

Details

Problems with traditional methods

image

image

Key Ideas in Generalized Focal Loss

image

  • focal loss image

  • quality focal loss image

  • distribution focal loss image

image

  • generalized focal loss image

Result

image