image

paper

TL;DR

  • I read this because.. : Mentioned in #58. I have a feeling the name is related to SGG
  • task : object detection
  • problem : There was intuition that modeling the relation within an object would improve object recognition, but no research proved it. The SOTA object detection study models each instance individually.
  • idea : Use the attention module to get the relation between objects and do a weighted sum to strengthen the vector
  • architecture : CNN -> RPN -> RoI -> FC -> object relation module -> fc -> object relation module -> cls / bbox prediction -> duplicate removal network
  • objective : bce for duplicate removal network, cross entropy loss
  • baseline : fasterRCNN, feature pyramid network(FPN), deformable convolutional network(DCN)
  • data : COCO
  • evaluation : mAP, mAP50, mAP75
  • result : SOTA. best on mAP, mAP50 best if trained with threshold 0.5, mAP75 best if trained with 0.75
  • contribution : first fully end-to-end object detector (without NMS)
  • limitation / things I cannot understand : duplicate removal network

Details

image

Object Relation Module

image image
  • $f_R$ : relation feature
  • $f_G$ : geometric feature
  • $f_A$ : appearance feature
  • $w_{mn}$ : How much does the mth object affect the nth object?
image

w_A^{mn}$ is just like scaled dot attention image

w_G^{mn}$ is obtained by extracting the features (combining them into $\varepsilon_G$), embedding with sine/cosine, multiplying by $W_g$, and taking ReLU image

Pull features image

In the end, $f^n_a$ is the concatenation of the nm object relations we picked.

Relation for Instance Recognition

image

Relation for Duplicate Removal

image

No big deal, just predict {0, 1} to predict. But since we have a relation module, we can remove the duplicates well.

  • rank feature : It was better to get and embed rank than to predict directly with score.
  • Depending on the threshold, correct and duplicate are given as labels, and depending on what theshold is given, the best is different for AP50, AP75…

Result

image image image