image

paper

TL;DR

  • I read this because.. : #75 had quite a bit of performance improvement. There was a paper in AAAI recently where this was the baseline.
  • task : two-stage SGG
  • PROBLEM : SGG data is long-tailed.
  • idea : confidence-aware bipartite graph neural network proposal. bi-level data resampling strategy.
  • architecture :** A combination of relationship confidence estimation (RCE) and confidence-aware message propagation (CMP)
  • objective : ce loss of predicate and entity, loss for relation confidence estimation(class-specific / overall)
  • baseline : graph-RCNN, GPS-Net, Motif, …
  • data : Visual Genome, Open Images V4/6
  • evaluation : PredCls, SGCls, SGGen(head, body, tail), OI evaluation
  • result : sota. tail score improved a lot.
  • contribution : confidence aware? gnn for sgg I’m not familiar with the papers, so I don’t know what is a contribution
  • limitation / things I cannot understand : What does confidence do? It looks like you gave loss directly to confidence, but how did you give it? Did you give it like “relatedness” in graph-RCNN?

Details

Architecture

image

Proposal generation network

Select objects with Faster RCNN and create an entity representation $e_i$ from them with visual feature $v_i$, geometric feature $g_i$, and class word embedding feature $w_i$. image

The relation representation $r_{i->j}$ is constructed by defecting the entity representations $e_i$, $e_j$. Let $u_i,j$ be the convolutional feature of the union region of two entities. image

Bipartite Graph Neural Network

  1. Relationship Confidence Estimation Module Find the confidence given the class probability of each entity $e_i$, $e_j$. image

(???) I don’t understand this part, at what point is it global? image

  1. Confidence-aware message image
  • entity-to-predicate image

  • predicate-to-entity image

The $\alpha$, $\beta$ are theshold parameter.

each entity node $e_i$ by aggregating neighbors’ messages image

Scene Graph Prediction

image image

Bi-level Resampling

image
  1. image-level over-sampling Like getting the repeat factor and pulling more images for a class that didn’t appear. $r^c=max(1, \sqrt(t/f^c))$
  • $c$ : category
  • $f_c$ : frequency of category c on the entire dataset
  • $t$ : hyperparam
  1. instance-level under-sampling Like removing instances based on different predicate classes for each image. -> Iterative SGG is one-stage, how did you do this? Did you just remove it from the gt label?

Result

image image image