[136] Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models

November 28, 2023 ยท 3 min ยท long8v ยท 

[131] Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

September 13, 2023 ยท 2 min ยท long8v ยท 

[32] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

June 28, 2022 ยท 2 min ยท long8v ยท