[164] TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

2024년 7월 18일 · 1 분 · long8v · | 번역:

En

View original issue on GitHub →

목차

TL;DR
Details

paper , page , code

TL;DR

I read this because.. : 개인 연구 관련 연구
task : faithful T2I evaluation
problem : prompt에 맞게 이미지가 생성되었는가를 평가하기 위해 CLIPScore의 단점이 있음
idea : VQA로 풀어보자!
input/output : {image, text} -> score
architecture : GPT-3 + UnifiedQA + VQA(mPLUG-large, BLIP-2.)
baseline : CLIPScore
evaluation : likert로 매겨진 human preference와 correlation
result : 더 높은 correlation

Details

motivation

TIFA overview

metric은 VQA로 했을 때 정답을 몇개 맞췄는가

GPT-3 prompt

TIFA detailed pipeline

#182 와 대동소이함! 다만 모든걸 GPT-3로 함 deterministic하게 하기 위해 LLaMA-3도 재학습함.

Question Filtering은 unified QA

TIFA v1.0 benchmark

Likert Score guideline

correlation between human preference