[164] TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

July 18, 2024 · 1 min · long8v · 

[149] Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

February 12, 2024 · 1 min · long8v · 

[148] I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

February 11, 2024 · 2 min · long8v · 

[147] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

February 7, 2024 · 3 min · long8v · 

[44] Context-Aware Scene Graph Generation With Seq2Seq Transformers

August 2, 2022 · 3 min · long8v · 

[38] Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries

July 22, 2022 · 2 min · long8v ·