image

paper

TL;DR

  • I read this because.. : SGG two-stage early papers
  • task : two-stage SGG
  • problem : One of the previous studies. Before this paper, I think there was neural motfis, #104, SGG with iterative message passing, etc.
  • idea : Make each object an enhanced embedding to predict!
  • architecture : Faster-RCNN + Create an embedding representing the object and use it to classify relation cls for $O(n^2)$ pairs. Global features + embeddings for cls drawn by od + RoI visual features + relative geometric information.
  • objective : 1) multi-label loss of object class at image level 2) cls loss for each object 3) relation classification loss
  • baseline : neural motfis, #104, SGG with iterative message passing
  • data : Visual Genome
  • evaluation : SGdet, SGcls, PredCls
  • result : sota
  • contribution : simple !

Details

Architecture

image

  • Global Context Encoding Module AvgPool for feature followed by FC for multi-label classification

  • Relation Embedding Module To create an objective feature $O_i$, we use the embedding of OD’s predicted cls $l_i$, features drawn from RoI pooling, and the image-wide context feature $c$ to create the embedding, and stack FCNs to predict cls.

image

image

image

Include geometric features when getting a relation image

Loss

image

Result

image