TL;DR

I read this because.. : #75 had quite a bit of performance improvement. There was a paper in AAAI recently where this was the baseline.
task : two-stage SGG
PROBLEM : SGG data is long-tailed.
idea : confidence-aware bipartite graph neural network proposal. bi-level data resampling strategy.
architecture :** A combination of relationship confidence estimation (RCE) and confidence-aware message propagation (CMP)
objective : ce loss of predicate and entity, loss for relation confidence estimation(class-specific / overall)
baseline : graph-RCNN, GPS-Net, Motif, …
data : Visual Genome, Open Images V4/6
evaluation : PredCls, SGCls, SGGen(head, body, tail), OI evaluation
result : sota. tail score improved a lot.
contribution : confidence aware? gnn for sgg I’m not familiar with the papers, so I don’t know what is a contribution
limitation / things I cannot understand : What does confidence do? It looks like you gave loss directly to confidence, but how did you give it? Did you give it like “relatedness” in graph-RCNN?

Details

Architecture

Proposal generation network

Select objects with Faster RCNN and create an entity representation $e_i$ from them with visual feature $v_i$, geometric feature $g_i$, and class word embedding feature $w_i$.

The relation representation $r_{i->j}$ is constructed by defecting the entity representations $e_i$, $e_j$. Let $u_i,j$ be the convolutional feature of the union region of two entities.

Bipartite Graph Neural Network

Relationship Confidence Estimation Module Find the confidence given the class probability of each entity $e_i$, $e_j$.

(???) I don’t understand this part, at what point is it global?

Confidence-aware message

entity-to-predicate
predicate-to-entity

The $\alpha$, $\beta$ are theshold parameter.

each entity node $e_i$ by aggregating neighbors’ messages

Scene Graph Prediction

Bi-level Resampling

image-level over-sampling Like getting the repeat factor and pulling more images for a class that didn’t appear. $r^c=max(1, \sqrt(t/f^c))$

$c$ : category
$f_c$ : frequency of category c on the entire dataset
$t$ : hyperparam

instance-level under-sampling Like removing instances based on different predicate classes for each image. -> Iterative SGG is one-stage, how did you do this? Did you just remove it from the gt label?

[87] Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation

TL;DR

Details

Architecture

Proposal generation network

Bipartite Graph Neural Network

Scene Graph Prediction

Bi-level Resampling

Result

TL;DR#

Details#

Architecture#

Proposal generation network#

Bipartite Graph Neural Network#

Scene Graph Prediction#

Bi-level Resampling#

Result#

TL;DR

Details

Architecture

Proposal generation network

Bipartite Graph Neural Network

Scene Graph Prediction

Bi-level Resampling

Result