[95] Pixels to Graphs by Associative Embedding

TL;DR

I read this because.. : SGG Early Papers
task : one-stage SGG
problem : Retrieve object without RPN and also retrieve relation
IDEA : Borrowing the idea of associate embeddings from multiperson pose estimation. A network that associates children with similar embeddings at body joints as the same person.
architecture : hourglass + CNN + 1D CNN to generate a heatmap of each likely object and likely relation. for GT for train and top k activated pixels for infer. object predicts anchor based box regressor, cls id. relation predicts relation class, subject object id.
objective : bbox regression loss + sigmoid loss for heatmap + ce for subject / object id +pull together loss + push apart loss
baseline : VRD with lanugage prior , Scene Graph Generation by Iterative Message Passing
data : Visual Genome
evaluation : SGGen, SGCls, PredCls
result : SOTA
contribution : first one-stage SGG
limitation/things I cannot understand : It seems that the feature vector has to predict and has additional losses that are close and far from each other, but they seem to be in different directions. It’s interesting to learn in one space.

A cut just because the picture is cute

A network similar to u-net. Used because both local and global information is needed for pose estimation.

Detecting graph elements image -> hourglass network -> CNN -> 1 x 1 conv + sigmoid to draw heatmap for object and relation (define bbox as median of sbj, obj) -> (for training) GT vertices, edges to draw features and then 1) obj predicts anchor based offset regression, cls, id with faster RCNN method 2) rel predicts rel cls, sbj (src in paper) id, obj (dest in paper) id
Connecting elements with associative embeddings Above, we only picked out object and relation ids, now we need to combine them. For each vertex, we get a vector embedding, which needs to be learned to vary from vertex to vertex, and for edges, it needs to be an embedding that can represent the ids of subject and object. So I added a pull together, push apart loss

$h_i\in\mathbb{R}^d$: embedding of vertex $v_i$ $h_{ik}’$ : embedding of all edges connected to vertex $v_i$. For $k=1,…K_i$.

To allow different Nodes to have different embeddings, you can use the