image

paper , code

TL;DR

  • task : probabilistic object detection
  • problem : bbox prediction distribution based on NLL loss tends to have high entropy regardless of whether the bbox is correct.
  • idea : use energy score instead of NLL loss -> lower entropy, better calibrated
  • architecture : RetinaNet, Faster-RCNN, DETR
  • objective : Energy Score
  • baseline : NLL loss, Direct Moment Matching(DMM)
  • data : COCO, Open Images
  • evaluation : Proposed a new metric to replace mAP. False positive if IoU<0.1 among GT matched bboxes, localization error if 0.1 ~ 0.5, and if there are multiple GT matched bboxes above 0.5, the highest class score is true positive, and the rest are separated into duplicates. Like mAP, the average value is obtained by thesholding from 0.5 ~ 0.95. Mean Calibration Error (MCE) and regression Calibration Error (CE) are also obtained.
  • result : better calibrated, lower entropy, higher quality predictive distribution
  • contribution : Propose a new evaluation
  • limitation or something I don’t understand : local-rule? non-local rule? entropy is bad if it’s high…

Details

Preliminaries

  • energy Calling E(x) energy when p(x) is proportional to exp(-E(x))
  • scoring rule A function that measures how good a distribution that predicts a class or bounding box given a feature is given actual observed events.
  • variance network https://github.com/long8v/PTIR/issues/92

Negative Log Likelihood as a scoring rule

NLL under Multivariate Gaussian image

Energy Score(ES)

image
  • $z_n$ : ground truth bounding box
  • $z_{n,i}$ : $i^{th}$ samples drawn from $N(\mu(x_n, \theta), \sigma(x_n, \theta))$.

Monte Carlo can be approximated as follows image

Direct Moment Matching

image

Motivation

image
  • NLL or energy score or the value at which the minimum is similar
  • NLL and ES are opposite, with NLL penalizing more when entropy is low ($\sigma$ is low) and ES penalizing more when entropy is high.
  • So NLL tends to learn high entropy whether bbox is correct or incorrect -> so why is that bad, I think I need to understand variance network.

Results

image