[86] Graph R-CNN for Scene Graph Generation

paper

TL;DR

I read this because.. : SGG Early Papers
task : Scene Graph Generation
problem : Pick an object and handle quadratic relations well. Create an enhanced graph representation.
idea : Put a module in the middle that prunes the relation between objects. apply an attentive GCN.
architecture : 1) Extract object with Faster RCNN 2) concat object cls logit values to prune relation 3) apply attentive GCN to enrich the representation of object, relation node -> attach classifier to each subject, object, relation representation as predicted?
objective : 1) bbox loss + cls loss 2) bce for relationship score 3) ce for object cls and predicate cls
baseline : IMP, MSDN, NeuralMotif
data : Visual Genome
evaluation : PredCls, PhrCls, SGGen, SGGen+(proposed in this paper)
result : SOTA
contribution : Probably the first paper to apply GCN?
limitation/things I can’t understand : Does SGG really have such graphical characteristics that I should write GCN?

Details

Architecture

Break it down into 3 steps

Object Region Proposal : Select nodes(=vertex, V) when given an image => Faster RCNN
Relationship Proposal : Given an image and a node, prune the relation that exists in all cases n*(n-1)
Graph Labeling: Finding relation and object given image, node, and edge

Relation Proposal Network

Measure “relatedness” using the object’s class logit. Give some sort of soft prior (e.g., can’t be person-ride-chicken?)

The implementation catches and then stacks MLPs. After scoring and sorting, we pick out K pairs. Since it is a faster RCNN, there will be a lot of pairs, so we do NMS on the pairs to keep only the top m pairs.

Attentional GCN

Vanilla GCN looks like this

$z_i$ : Representation of the i-th node
$N(i)$ : neighbors of the ith node
$\alpha_{ij}$: connection coefficient created by the adjacency matrix of i and j

If we express this as a matrix called $Z\in \mathbb{R}^{d\times T_n}$, then we get

We are trying to learn $\alpha_{ij}$ here, not given it

2-layer MLP + softmax to learn $\alpha_{ij}$

aGCN for SGG

Create N object regions and m relationships, each with a node, and connect the edges from the network above. Additionally, add direct edges between objects.

The representation for an object node is as follows

The representation for a relation node is shown below.

[86] Graph R-CNN for Scene Graph Generation

TL;DR

Details

Architecture

Relation Proposal Network

Attentional GCN

aGCN for SGG

Result

Ablation for modules

TL;DR#

Details#

Architecture#

Relation Proposal Network#

Attentional GCN#

aGCN for SGG#

Result#

Ablation for modules#

TL;DR

Details

Architecture

Relation Proposal Network

Attentional GCN

aGCN for SGG

Result

Ablation for modules