[196] Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

January 17, 2025 · 2 min · long8v · 

[167] Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

July 24, 2024 · 2 min · long8v · 

[165] Rich Human Feedback for Text-to-Image Generation

July 19, 2024 · 2 min · long8v · 

feat: add text span

May 7, 2024 · 1 min · long8v · 

[156] Interpreting CLIP's Image Representation via Text-Based Decomposition

May 6, 2024 · 2 min · long8v · 

[143] Honeybee: Locality-enhanced Projector for Multimodal LLM

December 22, 2023 · 3 min · long8v · 

[141] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

December 15, 2023 · 3 min · long8v · 

[139] Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation

December 11, 2023 · 2 min · long8v · 

[138] ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

December 8, 2023 · 2 min · long8v · 

[137] mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

December 5, 2023 · 3 min · long8v ·