[196] Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

2025λ…„ 1μ›” 17일 Β· 2 λΆ„ Β· long8v Β· 

[167] Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

2024λ…„ 7μ›” 24일 Β· 2 λΆ„ Β· long8v Β· 

[165] Rich Human Feedback for Text-to-Image Generation

2024λ…„ 7μ›” 19일 Β· 2 λΆ„ Β· long8v Β· 

feat: add text span

2024λ…„ 5μ›” 7일 Β· 1 λΆ„ Β· long8v Β· 

[156] Interpreting CLIP's Image Representation via Text-Based Decomposition

2024λ…„ 5μ›” 6일 Β· 2 λΆ„ Β· long8v Β· 

[143] Honeybee: Locality-enhanced Projector for Multimodal LLM

2023λ…„ 12μ›” 22일 Β· 3 λΆ„ Β· long8v Β· 

[141] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

2023λ…„ 12μ›” 15일 Β· 2 λΆ„ Β· long8v Β· 

[139] Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation

2023λ…„ 12μ›” 11일 Β· 2 λΆ„ Β· long8v Β· 

[138] ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

2023λ…„ 12μ›” 8일 Β· 2 λΆ„ Β· long8v Β· 

[137] mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

2023λ…„ 12μ›” 5일 Β· 2 λΆ„ Β· long8v Β·