image

paper

TL;DR

  • I read this because.. : SGG Early Papers
  • task : Scene Graph Generation
  • problem : Pick an object and handle quadratic relations well. Create an enhanced graph representation.
  • idea : Put a module in the middle that prunes the relation between objects. apply an attentive GCN.
  • architecture : 1) Extract object with Faster RCNN 2) concat object cls logit values to prune relation 3) apply attentive GCN to enrich the representation of object, relation node -> attach classifier to each subject, object, relation representation as predicted?
  • objective : 1) bbox loss + cls loss 2) bce for relationship score 3) ce for object cls and predicate cls
  • baseline : IMP, MSDN, NeuralMotif
  • data : Visual Genome
  • evaluation : PredCls, PhrCls, SGGen, SGGen+(proposed in this paper)
  • result : SOTA
  • contribution : Probably the first paper to apply GCN?
  • limitation/things I can’t understand : Does SGG really have such graphical characteristics that I should write GCN?

Details

Architecture

image image image

Break it down into 3 steps

  1. Object Region Proposal : Select nodes(=vertex, V) when given an image => Faster RCNN
  2. Relationship Proposal : Given an image and a node, prune the relation that exists in all cases n*(n-1)
  3. Graph Labeling: Finding relation and object given image, node, and edge

Relation Proposal Network

Measure “relatedness” using the object’s class logit. Give some sort of soft prior (e.g., can’t be person-ride-chicken?) image

The implementation catches and then stacks MLPs. After scoring and sorting, we pick out K pairs. Since it is a faster RCNN, there will be a lot of pairs, so we do NMS on the pairs to keep only the top m pairs. image

Attentional GCN

Vanilla GCN looks like this image

  • $z_i$ : Representation of the i-th node
  • $N(i)$ : neighbors of the ith node
  • $\alpha_{ij}$: connection coefficient created by the adjacency matrix of i and j

If we express this as a matrix called $Z\in \mathbb{R}^{d\times T_n}$, then we get image

We are trying to learn $\alpha_{ij}$ here, not given it image

2-layer MLP + softmax to learn $\alpha_{ij}$

aGCN for SGG

Create N object regions and m relationships, each with a node, and connect the edges from the network above. Additionally, add direct edges between objects.

The representation for an object node is as follows image

The representation for a relation node is shown below. image

Result

image image

Ablation for modules

image

image