[209] SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-traininggoogle RL Berkley 2025Q1
[163] What You See is What You Read? Improving Text-Image Alignment Evaluationgoogle NeurIPS 2023Q2 evaluation
[155] Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratingsgoogle evaluation generation 2024Q2
[154] Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignmentgoogle XAI evaluation 2024Q2
[139] Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generationgoogle 2023Q4 evaluation generation
[128] Pix2Struct: Screenshot Parsing as Pretraining for Visual Language UnderstandingICML google 2022Q3 document
[73] Simple Open-Vocabulary Object Detection with Vision Transformersgoogle object detection 2022Q2 25min ECCV OV
[9] SimCLR : A Simple Framework for Contrastive Learning of Visual Representationsfew-shot SSL 2020Q3 ICML google