image paper

VQA์— ๊ทธ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต์ด ์™œ์ธ์ง€ ์„ค๋ช…ํ•˜๋Š” ๋ฐ์ดํ„ฐ์…‹ VQA-X๋ฅผ ์ˆ˜์ง‘. image ์šฐ์ธก์˜ MPII Human Pose (MHP) dataset์€ ์‚ฌ์ง„์—์„œ ์‚ฌ๋žŒ์ด ์–ด๋–ค pose๋ฅผ ํ•˜๊ณ  ์žˆ๋Š”์ง€์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ์…‹์ธ๋ฐ, ์—ญ์‹œ ์ด ๋˜ํ•œ ์ฃผ๋ณ€์˜ ์‚ฌ๋ฌผ, ์‚ฌ๋žŒ๋“ค์— ๋งŽ์ด ์˜์กดํ•˜๋ฏ€๋กœ ์ด์—๋Œ€ํ•œ ์ค„๊ธ€ ์„ค๋ช…์„ ์ถ”๊ฐ€ํ•œ ACT-X๋ฅผ ์ˆ˜์ง‘. (c.f. ์ตœ๊ทผ์— CLEVR-X ๋„ ์ถ”๊ฐ€๋จ)

image ์—ฌ๊ธฐ์— ์ถ”๊ฐ€์ ์œผ๋กœ ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ๊ทธ ๊ทผ๊ฑฐ๋ฅผ ์ฐพ์€ label์„ ground truth for pointing

์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ์…‹ image, query์— ๋Œ€ํ•œ ๋‹ต๋ณ€๊ณผ explanation์„ ์ œ์‹œํ•˜๋Š” Pointing and Justification Explanation (PJ-X) ๋ชจ๋ธ์„ ์ œ์•ˆ. image

results image

image

idea

  • ์ด๋Ÿฌํ•œ explanation์€ ๋ฐ˜๋Œ€๋กœ few-shot์—์„œ explanation์œผ๋กœ๋„ ์“ธ ์ˆ˜ ์žˆ์„๋“ฏ.
  • DocVQA์— ๋Œ€ํ•ด ์ด๋Ÿฐ ๋ฐ์ดํ„ฐ์…‹์„ ๊ตฌ์ถ•ํ•˜๋ฉด ์–ด๋–จ๊นŒ? Q : “๊น๋‘๊ธฐ์˜ ๊ฐ€๊ฒฉ?” A : “500์›” X: “๊ฐ™์€ row์— ์žˆ๊ธฐ ๋•Œ๋ฌธ์—”

related papers