[209] SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

May 21, 2025 ยท 2 min ยท long8v ยท 

[195] STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning

January 9, 2025 ยท 1 min ยท long8v ยท 

[163] What You See is What You Read? Improving Text-Image Alignment Evaluation

July 18, 2024 ยท 2 min ยท long8v ยท 

[155] Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

May 3, 2024 ยท 2 min ยท long8v ยท 

[154] Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

April 3, 2024 ยท 2 min ยท long8v ยท 

[139] Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation

December 11, 2023 ยท 2 min ยท long8v ยท 

[128] Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

August 21, 2023 ยท 3 min ยท long8v ยท 

[124] LiT: Zero-Shot Transfer with Locked-image text Tuning

July 6, 2023 ยท 4 min ยท long8v ยท 

[118] PaLI-X: On Scaling up a Multilingual Vision and Language Model

June 8, 2023 ยท 4 min ยท long8v ยท 

[114] MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

May 9, 2023 ยท 3 min ยท long8v ยท 

[92] Long-Tail Learning via Logit Adjustment

December 26, 2022 ยท 1 min ยท long8v ยท 

[59] MLP-Mixer: An all-MLP Architecture for Vision

September 1, 2022 ยท 1 min ยท long8v ยท 

[30] CoCa: Contrastive Captioners are Image-Text Foundation Models

June 22, 2022 ยท 2 min ยท long8v ยท 

[23] Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

April 25, 2022 ยท 3 min ยท long8v ยท 

[18] Deep Learning with Differential Privacy

April 4, 2022 ยท 1 min ยท long8v ยท 

[9] SimCLR : A Simple Framework for Contrastive Learning of Visual Representations

January 25, 2022 ยท 3 min ยท long8v ยท