[164] TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

2024λ…„ 7μ›” 18일 Β· 1 λΆ„ Β· long8v Β· 

[149] Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

2024λ…„ 2μ›” 12일 Β· 1 λΆ„ Β· long8v Β· 

[148] I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

2024λ…„ 2μ›” 11일 Β· 2 λΆ„ Β· long8v Β· 

[147] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

2024λ…„ 2μ›” 7일 Β· 2 λΆ„ Β· long8v Β· 

[44] Context-Aware Scene Graph Generation With Seq2Seq Transformers

2022λ…„ 8μ›” 2일 Β· 2 λΆ„ Β· long8v Β· 

[38] Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries

2022λ…„ 7μ›” 22일 Β· 2 λΆ„ Β· long8v Β·