
TL;DR
- task : long-tail object recognition
- Previous studies only focused on foreground - background and did not address class imbalance within the foreground! Rare classes, whether sigmoid or softmax, are affected by the gradient due to negative samples of frequent classes.
- IDEA : Give the $log(p_j)$ term of sigmoid / softmax a frequency-based weight before the $log(p_j)$ term.
- architecture : ResNet-50 Mask R-CNN
- objective : equalization loss(proposed in this paper)
- baseline : sigmoid, softmax, class-aware sampling, class balanced loss, focal loss
- data : LVIS v0.5, CIFAR-100-LT, ImageNet-LT
- result : Overall performance improvement for AP, AP50 over baseline. Performance for rare, frequent is worse than baseline and common is very good.
- contribution : Probably the first paper on class imbalance in foreground?
Details
Motivation

To the right, the rarer classes have the effect of making the gradient of negative samples higher than positive
Equalization Loss Formulation


- $E(r)$ : 1 or 0 if foreground
- $f_j$: frequency of class j
- Tresholding $T_\lambda$ : 0 or 1 if $x < \lambda$ tresholding
In this case, $\lambda$ looks at the Tail Ratio (TR) below and realizes that pus => pus => pus is not better or worse in absolute terms, it’s just that frequent <=> rare performs differently depending on the value.

Softmax Equalization Loss Formulation

- multiply weight by the denominator only

- $\beta$: Random variable that is 1 with probability $\gamma$ and 0 with probability $1-\gamma
Result

Adding it improves performance across the board!

Better overall compared to other long-tail losses, but worse than sampling methods for rare, frequent cases Definitely better than Focal!
Ablation
Higher tail ratio means better for frequent classes and worse for rare -> $\lambda$ is fully hyperparametric

Ablation for E(r), replacing 1 if background. rare becomes bad.