[82] Estimating and Evaluating Regression Predictive Uncertainty in Deep Object Detectors

TL;DR

task : probabilistic object detection
problem : bbox prediction distribution based on NLL loss tends to have high entropy regardless of whether the bbox is correct.
idea : use energy score instead of NLL loss -> lower entropy, better calibrated
architecture : RetinaNet, Faster-RCNN, DETR
objective : Energy Score
baseline : NLL loss, Direct Moment Matching(DMM)
data : COCO, Open Images
evaluation : Proposed a new metric to replace mAP. False positive if IoU<0.1 among GT matched bboxes, localization error if 0.1 ~ 0.5, and if there are multiple GT matched bboxes above 0.5, the highest class score is true positive, and the rest are separated into duplicates. Like mAP, the average value is obtained by thesholding from 0.5 ~ 0.95. Mean Calibration Error (MCE) and regression Calibration Error (CE) are also obtained.
result : better calibrated, lower entropy, higher quality predictive distribution
contribution : Propose a new evaluation
limitation or something I don’t understand : local-rule? non-local rule? entropy is bad if it’s high…

energy Calling E(x) energy when p(x) is proportional to exp(-E(x))
scoring rule A function that measures how good a distribution that predicts a class or bounding box given a feature is given actual observed events.
variance network https://github.com/long8v/PTIR/issues/92

NLL under Multivariate Gaussian

$z_n$ : ground truth bounding box
$z_{n,i}$ : $i^{th}$ samples drawn from $N(\mu(x_n, \theta), \sigma(x_n, \theta))$.

Monte Carlo can be approximated as follows

NLL or energy score or the value at which the minimum is similar
NLL and ES are opposite, with NLL penalizing more when entropy is low ($\sigma$ is low) and ES penalizing more when entropy is high.
So NLL tends to learn high entropy whether bbox is correct or incorrect -> so why is that bad, I think I need to understand variance network.