[136] Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models

2023λ…„ 11μ›” 28일 Β· 2 λΆ„ Β· long8v Β· 

[131] Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

2023λ…„ 9μ›” 13일 Β· 2 λΆ„ Β· long8v Β· 

[32] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

2022λ…„ 6μ›” 28일 Β· 1 λΆ„ Β· long8v Β·