[88] Relation Networks for Object Detection

TL;DR

I read this because.. : #58 에서 언급됨. 이름이 SGG랑 관련돼있을 것 같은 느낌
task : object detection
problem : object내의 relation을 모델링하면 object recognition을 더 잘할 것 같다는 직관은 있었으나 이를 증명한 연구는 없었음. sota object detection 연구는 각 instance를 각각 모델링함.
idea : attention module을 사용해서 object 간의 relation을 구하고 이를 weighted sum해서 벡터를 강화하자
architecture : CNN -> RPN -> RoI -> FC -> object relation module -> fc -> object relation module -> cls / bbox prediction -> duplicate removal network
objective : bce for duplicate removal network, cross entropy loss
baseline : fasterRCNN, feature pyramid network(FPN), deformable convolutional network(DCN)
data : COCO
evaluation : mAP, mAP50, mAP75
result : SOTA. mAP에서 best, threshold 0.5로 학습하면 mAP50 best, 0.75로 하면 mAP75 best
contribution : first fully end-to-end object detector (without NMS)
limitation / things I cannot understand : duplicate removal network

$w_A^{mn}$은 그냥 scaled dot attention이랑 비슷

$w_G^{mn}$는 feature를 뽑고(두개를 합쳐서 $\varepsilon_G$) sine/cosine으로 임베딩 시킨 뒤에 $W_g$ 곱해주고 ReLU 취해서 구해짐

뽑는 feature

최종적으로 $f^n_a$는 저렇게 뽑은 nm개의 object relation을 concat해서 나온다.

별거 아니고 그냥 {0, 1} 로 예측하는거. 근데 relation module이 있으니 중복을 잘 제거할 수 있을듯.

rank feature : score로 바로 예측하는 것보다 rank를 구하고 embedding 하는게 좋았다.
threshold에 따라 correct, duplicate를 label로 주는데 이 theshold를 뭘로 주냐에 따라 AP50, AP75..의 best가 달랐음