[177] Fine-grained Image Captioning with CLIP Reward

2024λ…„ 9μ›” 6일 Β· 2 λΆ„ Β· long8v Β· 

[116] Data Distributional Properties Drive Emergent In-Context Learning in Transformers

2023λ…„ 5μ›” 22일 Β· 2 λΆ„ Β· long8v Β· 

[55] Position Prediction as an Effective Pretraining Strategy

2022λ…„ 8μ›” 26일 Β· 1 λΆ„ Β· long8v Β· 

[31] GIT: A Generative Image-to-text Transformer for Vision and Language

2022λ…„ 6μ›” 26일 Β· 2 λΆ„ Β· long8v Β· 

[30] CoCa: Contrastive Captioners are Image-Text Foundation Models

2022λ…„ 6μ›” 22일 Β· 1 λΆ„ Β· long8v Β·