image

CHAIR (== Object HalBench)

[18’EMNLP] Object Hallucination in Image Captioning https://arxiv.org/abs/1809.02156

  • COCO caption & semantic segmentation label – ๋™์˜์–ด๋ฅผ ์‚ฌ์šฉํ•ด์„œ captioning model์˜ hallucination ์ธก์ •

  • CHAIR_i์˜ ๋ถ„๋ชจ๋Š” ์–ธ๊ธ‰๋œ ๋ชจ๋“  object ๊ฐœ์ˆ˜ // CHAIR_s๋Š” ๋ฌธ์žฅ ๊ฐœ์ˆ˜

  • COCO karpathy / robust test set
    image

  • ์ด ๋…ผ๋ฌธ์—์„œ ๋งํ•˜๊ณ ์ž ํ–ˆ๋˜๊ฑด CIDEr ๋“ฑ captioning ์„ฑ๋Šฅ์€ ๋†’๋”๋ผ๋„ ์‹ค์ œ๋กœ hallucination ์„ฑ๋Šฅ์€ ์ด์™€ ๋น„๋ก€ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์ 

  • LVLM์—์„œ๋Š” RLHF-V๊ฐ€ ๋งŒ๋“  descriptive ์„ค๋ช…์„ ํ•˜๋ผ๋Š” 8๊ฐœ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ฃผ๊ณ  gt segment์™€ CHAIR๋ฅผ ๊ตฌํ•˜๊ณ  ์ด๊ฐ€ Object Halbench๋กœ ๋ ˆํฌํŠธ๋จ

POPE

[24’EMNLP] Evaluating Object Hallucination in Large Vision-Language Models https://arxiv.org/pdf/2305.10355

  • ์œ„์˜ CHAIR ๊ฐ™์€ object hallucination์„ LVLM์œผ๋กœ ๊ฐ€์ ธ์™€ ์ธก์ •ํ•œ ๋…ผ๋ฌธ image

  • ๊ทธ๋Ÿฐ๋ฐ ์ด๋•Œ prompt๋ฅผ ์–ด๋–ป๊ฒŒ ํ• ์ง€์— ๋”ฐ๋ผ ์„ฑ๋Šฅ์ด ๋“ค์ญ‰๋‚ ์ญ‰ํ•˜๋‹ค. ๊ทธ๋ฆฌ๊ณ  object๋ฅผ ๋ฝ‘๊ณ  GT object๋ž‘ ๋งค์นญํ•˜๋Š”๋ฐ ๋ณต์žกํ•œ Human parsing rule์ด ํ•„์š”ํ•˜๋‹ค

  • ๊ทธ๋ž˜์„œ ์ œ์•ˆํ•œ ๊ฒƒ์ด POPE image

  • ์บก์…˜์„ ์ƒ์„ฑํ•˜๊ณ  hallucinated object๋ฅผ ์ฐพ๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ yes, no ๋กœ ๋Œ€๋‹ตํ•  ์ˆ˜ ์žˆ๋Š” question์„ ๋งŒ๋“ค์–ด์„œ ์ธก์ •

  • gt label์€ semantic label SEEM ๊ฐ™์€ ๊ฒƒ์œผ๋กœ ๋ฝ‘์•„์„œ object pool ๋ณด๊ฐ•

  • ์—ฌ๊ธฐ์— 3๊ฐ€์ง€ negative set์„ ๋งŒ๋“ฆ

    • random : random object class
    • popular : ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ ๋งŽ์ด ๋‚˜ํƒ€๋‚œ object class
    • adversarial : ํ˜„์žฌ ๋“ฑ์žฅํ•œ object์™€ ๊ฐ™์ด ๋งŽ์ด ๋“ฑ์žฅํ•œ object class
  • ์‚ฌ์šฉํ•œ set์€ COCO์—์„œ object ๊ฐ€ 3 ๊ฐœ ์ด์ƒ ๋‚˜์˜ค๋Š” subset 500๊ฐœ๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค๊ณ 

  • ์ด ๋…ผ๋ฌธ์—์„œ ๋ฐœ๊ฒฌํ•œ ๊ฒƒ์€ 1) COCO์—์„œ ๋งŽ์ด ๋“ฑ์žฅํ•œ 2) COCO์—์„œ ๋งŽ์ด ์ž์ฃผ ๋“ฑ์žฅํ•œ object hallucination์ด ์‹ฌํ–ˆ๋‹ค๊ณ 

image

HallusionBench

[CVPR'24] HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models https://arxiv.org/abs/2310.14566

AMBER

[arxiv'24] AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation https://arxiv.org/abs/2311.07397

image

image

image

๋‘๊ฐ€์ง€๊ฐ€ ์žˆ์Œ 1) generative 2) discriminative generative๋Š” Object existence๋ฅผ ์œ„ํ•ด ๊ณ ์•ˆ๋˜์—ˆ๊ณ  discriminative ๋Š” object, relation, attribute ๋ชจ๋‘ ๊ตฌํ•  ์ˆ˜ ์žˆ์Œ ๋ฏธ๋ฆฌ ์ด๋ฏธ์ง€์™€ ์ด์— ๋“ฑ์žฅํ•œ object, attribute, relation Label์„ ๋‹ค annotateํ•œ ๋’ค์— discriminative๋Š” yes, no๋กœ ๊ทธ๋ƒฅ ๋งž์ถค generative๋Š” ์ƒ์„ฑ๋œ ์บก์…˜์— ๋Œ€ํ•ด noun parseํ•˜๊ณ  ๊ทธ ๋‹ค์Œ์— ๊ทธ๋ƒฅ CHAIR ์ธ๋“ฏ.. ํ ๋ƒ