image

paper

TL;DR

  • task : Scene Graph Generation
  • problem : Due to the nature of SGG, it is a long tail distribution with a lot of unlabeled data and only certain relations appearing a lot.
  • idea : Let’s look at the problem from a Positive-Unlabeled Learning perspective and divide the logit value by the frequency of all class labels.
  • architecture : object detector + GNN?
  • objective : cross entropy loss
  • baseline : MOTIFS, …
  • data : Visual Genome, Visual Genome150
  • result : It appears to be sgdet SOTA for the current VG150.
  • Fix the contribution : long-tail issue
  • Limitations or things I don’t understand :

Details

image

Recovering the Unbiased Scene Graph

image
  • s: labeled pred
  • y : true pred
  • r : target pred

unbiased probability image

If we assume that the probability of being labeled is independent of x (Selected Completely at Random, SCAR), we can write image

p(s=r|y=r) is eventually the ratio of labeled examples to the total class r.

Dynamic Label Frequency Estimation

Get an estimate for p(s=r|y=r) above, i.e., the label frequency. image

This expression is derived from image

We end up dividing the entire data by frequency by class -.-

  1. it is difficult to obtain post-training estimates before inference and
  2. For SGDET, there is no gt bbox, so it is difficult to estimate a valid example.

So we’ll do data augmentation to get a vaild example for the tail class, and the label frequency will be estimated on a batch-by-batch basis. We’ll call this idea Dynamic Label Frequency Estimation (DLFE). image

image

image