
TL;DR
- task : deep saliency map
- problem : ๊ธฐ์กด ๋ฐฉ๋ฒ๋ก ๋ค์ 1) weak scale invariance 2) ์ฌ๋ฌ object ์ ๋ชป์ฐพ์ 3) distactor์ ์ํฅ์ ๋ฐ์ 4) gradient based visualization์ ๊ฒฝ์ฐ noisyํจ 5) GradCAM ๊ฐ์ ๊ฒฝ์ฐ ๊ตฌ๋ถ์ด ์ ์๋จ(๋ฐ์ ๋ฑ ์ฌ์ง ๊ฐ์ ์์) 6) input size๊ฐ ๊ณ ์ ๋์ด ์๋ ๊ฒฝ์ฐ๊ฐ ๋ง์์ resize๋ฅผ ํด์ผ๋๊ณ ํด์๋์ ์ด๋ฏธ์ง ๋น์จ์ด ๋ฐ๋์ด์ ๊ฒฐ๊ณผ๊ฐ ์์ข์
- idea : ์ด๋ฏธ์ง๋ฅผ mult-scale๋ก ์ฌ๋ฌ input์ผ๋ก ๋๋ ๋ค์ sliding window๋ก ์๋ฅด๊ณ , classification ํ์ ๋ ํด๋น class์ ๋ํ scoring์ด ๋์ ๊ฒ๋ค์ ๊ฐ์คํฉํด์ saliency map์ ๋ง๋ค์
- architecture : add-on ๋ฐฉ๋ฒ๋ก ์ด๋ผ Guided-BP, CAM, GradCAM ๋ฑ ์ด๋ saliency map ๋ฝ๋ ๋ชจ๋ธ์ด๋ ์ ์ฉํ ์ ์์
- baseline : vanilla deep saliency map methods, RISE, XRAI
- data : ImageNet-1K, PASCAL VOC07, MSCOCO2014
- result : pointing game ์ด๋ setting์์ GRAD-CAM๊ณผ ๊ฐ์ด ์ผ์ ๋ SOTA. RISE, XRAI ๋ฐฉ๋ฒ๋ก ๋ณด๋ค inference ์๋๊ฐ ๋น ๋ฆ
- contribution : ๋ฐฉ๋ฒ๋ก ์ง๊ด์ ์ด๊ณ ์ฌ์ด๋ฐ inference ์๋๊ฐ ๋น ๋ฅธ๊ฒ๊ณผ add-on ๋ฐ
- limitation or ์ดํด ์๋๋ ๋ถ๋ถ :
Details
Methodology

Qualitative result

pointing game
https://link.springer.com/chapter/10.1007/978-3-319-46493-0_33 The goal of this section is to evaluate the discriminativeness of different top-down attention maps for localizing target objects in crowded visual scenes.
Evaluation setting. Given a pre-trained CNN classifier, we test different methods in generating a top-down attention map for a target object category present in an image. Ground truth object labels are used to cue the method. We extract the maximum point on the top-down attention map. A hit is counted if the maximum point lies on one of the annotated instances of the cued object category, otherwise a miss is counted. We measure the localization accuracy by ๐ด๐๐=#๐ป๐๐ก๐ #๐ป๐๐ก๐ +#๐๐๐ ๐ ๐๐ for each object category. The overall performance is measured by the mean accuracy across different categories.