[177] Fine-grained Image Captioning with CLIP Reward

September 6, 2024 · 2 min · long8v · 

[116] Data Distributional Properties Drive Emergent In-Context Learning in Transformers

May 22, 2023 · 3 min · long8v · 

[55] Position Prediction as an Effective Pretraining Strategy

August 26, 2022 · 1 min · long8v · 

[31] GIT: A Generative Image-to-text Transformer for Vision and Language

June 26, 2022 · 2 min · long8v ·