image

paper

TL;DR

  • task : deep saliency map
  • problem : ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ก ๋“ค์€ 1) weak scale invariance 2) ์—ฌ๋Ÿฌ object ์ž˜ ๋ชป์ฐพ์Œ 3) distactor์— ์˜ํ–ฅ์„ ๋ฐ›์Œ 4) gradient based visualization์˜ ๊ฒฝ์šฐ noisyํ•จ 5) GradCAM ๊ฐ™์€ ๊ฒฝ์šฐ ๊ตฌ๋ถ„์ด ์ž˜ ์•ˆ๋จ(๋ฐ‘์˜ ๋ฑ€ ์‚ฌ์ง„ ๊ฐ™์€ ์˜ˆ์‹œ) 6) input size๊ฐ€ ๊ณ ์ •๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์•„์„œ resize๋ฅผ ํ•ด์•ผ๋˜๊ณ  ํ•ด์ƒ๋„์™€ ์ด๋ฏธ์ง€ ๋น„์œจ์ด ๋ฐ”๋€Œ์–ด์„œ ๊ฒฐ๊ณผ๊ฐ€ ์•ˆ์ข‹์Œ
  • idea : ์ด๋ฏธ์ง€๋ฅผ mult-scale๋กœ ์—ฌ๋Ÿฌ input์œผ๋กœ ๋‚˜๋ˆˆ ๋’ค์— sliding window๋กœ ์ž๋ฅด๊ณ , classification ํ–ˆ์„ ๋•Œ ํ•ด๋‹น class์— ๋Œ€ํ•œ scoring์ด ๋†’์€ ๊ฒƒ๋“ค์„ ๊ฐ€์ค‘ํ•ฉํ•ด์„œ saliency map์„ ๋งŒ๋“ค์ž
  • architecture : add-on ๋ฐฉ๋ฒ•๋ก ์ด๋ผ Guided-BP, CAM, GradCAM ๋“ฑ ์–ด๋А saliency map ๋ฝ‘๋Š” ๋ชจ๋ธ์ด๋“  ์ ์šฉํ•  ์ˆ˜ ์žˆ์Œ
  • baseline : vanilla deep saliency map methods, RISE, XRAI
  • data : ImageNet-1K, PASCAL VOC07, MSCOCO2014
  • result : pointing game ์ด๋ž€ setting์—์„œ GRAD-CAM๊ณผ ๊ฐ™์ด ์ผ์„ ๋•Œ SOTA. RISE, XRAI ๋ฐฉ๋ฒ•๋ก ๋ณด๋‹ค inference ์†๋„๊ฐ€ ๋น ๋ฆ„
  • contribution : ๋ฐฉ๋ฒ•๋ก  ์ง๊ด€์ ์ด๊ณ  ์‰ฌ์šด๋ฐ inference ์†๋„๊ฐ€ ๋น ๋ฅธ๊ฒƒ๊ณผ add-on ๋ฐ”
  • limitation or ์ดํ•ด ์•ˆ๋˜๋Š” ๋ถ€๋ถ„ :

Details

Methodology

image

Qualitative result

image

pointing game

https://link.springer.com/chapter/10.1007/978-3-319-46493-0_33 The goal of this section is to evaluate the discriminativeness of different top-down attention maps for localizing target objects in crowded visual scenes.

Evaluation setting. Given a pre-trained CNN classifier, we test different methods in generating a top-down attention map for a target object category present in an image. Ground truth object labels are used to cue the method. We extract the maximum point on the top-down attention map. A hit is counted if the maximum point lies on one of the annotated instances of the cued object category, otherwise a miss is counted. We measure the localization accuracy by ๐ด๐‘๐‘=#๐ป๐‘–๐‘ก๐‘ #๐ป๐‘–๐‘ก๐‘ +#๐‘€๐‘–๐‘ ๐‘ ๐‘’๐‘  for each object category. The overall performance is measured by the mean accuracy across different categories.