[99] LinkNet: Relational Embedding for Scene Graph

TL;DR

I read this because.. : SGG two-stage early papers
task : two-stage SGG
problem : One of the previous studies. Before this paper, I think there was neural motfis, #104, SGG with iterative message passing, etc.
idea : Make each object an enhanced embedding to predict!
architecture : Faster-RCNN + Create an embedding representing the object and use it to classify relation cls for $O(n^2)$ pairs. Global features + embeddings for cls drawn by od + RoI visual features + relative geometric information.
objective : 1) multi-label loss of object class at image level 2) cls loss for each object 3) relation classification loss
baseline : neural motfis, #104, SGG with iterative message passing
data : Visual Genome
evaluation : SGdet, SGcls, PredCls
result : sota
contribution : simple !

Global Context Encoding Module AvgPool for feature followed by FC for multi-label classification
Relation Embedding Module To create an objective feature $O_i$, we use the embedding of OD’s predicted cls $l_i$, features drawn from RoI pooling, and the image-wide context feature $c$ to create the embedding, and stack FCNs to predict cls.