image

paper

TL;DR

  • I read this because.. : SGG two-stage ์ดˆ๊ธฐ ๋…ผ๋ฌธ
  • task : two-stage SGG
  • problem : ์„ ํ–‰ ์—ฐ๊ตฌ๋“ค ์ค‘ ํ•˜๋‚˜. ์ด ๋…ผ๋ฌธ ์ „์— neural motfis, #104, SGG with iterative message passing ์ •๋„ ์žˆ์—ˆ๋˜ ๋“ฏ
  • idea : ๊ฐ ์˜ค๋ธŒ์ ํŠธ๋“ค์„ ๊ฐ•ํ™”๋œ embedding์œผ๋กœ ๋งŒ๋“ค์–ด์„œ ์˜ˆ์ธกํ•˜์ž!
  • architecture : Faster-RCNN + object๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ์ž„๋ฒ ๋”ฉ์„ ๋งŒ๋“ค๊ณ  ์ด๊ฑธ๋กœ $O(n^2)$๊ฐœ pair์— ๋Œ€ํ•ด relation cls ๋ถ„๋ฅ˜. global feature + od๊ฐ€ ๋ฝ‘์€ cls์— ๋Œ€ํ•œ ์ž„๋ฒ ๋”ฉ + RoI visual feature + relative geometric ์ •๋ณด๋“ค์ด ๋“ค์–ด๊ฐ.
  • objective : 1) ์ด๋ฏธ์ง€ ๋ ˆ๋ฒจ์—์„œ object class๋ฅผ multi-label loss 2) ๊ฐ object์— ๋Œ€ํ•ด cls loss 3) relation classification loss
  • baseline : neural motfis, #104, SGG with iterative message passing
  • data : Visual Genome
  • evaluation : SGdet, SGcls, PredCls
  • result : sota
  • contribution : simple !

Details

Architecture

image

  • Global Context Encoding Module feature์— ๋Œ€ํ•ด AvgPool ํ•œ ๋’ค์— FC ๋ถ™์—ฌ์„œ multi-label classification

  • Relation Embedding Module Obejct feature $O_i$๋ฅผ ๋งŒ๋“œ๋Š”๋ฐ OD๊ฐ€ ์˜ˆ์ธกํ•œ cls $l_i$์˜ ์ž„๋ฒ ๋”ฉ๊ณผ RoI pooling์œผ๋กœ ๋ฝ‘์€ feature, image ์ „์ฒด์˜ context feature $c$๋ฅผ ํ•ด์„œ ์ž„๋ฒ ๋”ฉ์„ ๋งŒ๋“ค๊ณ , FCN์„ ์Œ“์•„์„œ cls๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค

image

image

image

relation์„ ๊ตฌํ•  ๋•Œ geometric feature๋„ ๋„ฃ์–ด์ค€๋‹ค image

Loss

image

Result

image