
TL;DR
- task : Scene Graph Generation
- problem : Due to the nature of SGG, it is a long tail distribution with a lot of unlabeled data and only certain relations appearing a lot.
- idea : Let’s look at the problem from a Positive-Unlabeled Learning perspective and divide the logit value by the frequency of all class labels.
- architecture : object detector + GNN?
- objective : cross entropy loss
- baseline : MOTIFS, …
- data : Visual Genome, Visual Genome150
- result : It appears to be sgdet SOTA for the current VG150.
- Fix the contribution : long-tail issue
- Limitations or things I don’t understand :
Details

Recovering the Unbiased Scene Graph

- s: labeled pred
- y : true pred
- r : target pred
unbiased probability

If we assume that the probability of being labeled is independent of x (Selected Completely at Random, SCAR), we can write

p(s=r|y=r) is eventually the ratio of labeled examples to the total class r.
Dynamic Label Frequency Estimation
Get an estimate for p(s=r|y=r) above, i.e., the label frequency.

This expression is derived from

We end up dividing the entire data by frequency by class -.-
- it is difficult to obtain post-training estimates before inference and
- For SGDET, there is no gt bbox, so it is difficult to estimate a valid example.
So we’ll do data augmentation to get a vaild example for the tail class, and the label frequency will be estimated on a batch-by-batch basis.
We’ll call this idea Dynamic Label Frequency Estimation (DLFE).


