[221] Scaling Synthetic Data Creation with 1,000,000,000 Personas

January 19, 2026 · 1 min · long8v · 

[194] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

January 3, 2025 · 4 min · long8v · 

[191] Critique-out-Loud Reward Models

December 17, 2024 · 2 min · long8v · 

[186] The Llama 3 Herd of Models

November 15, 2024 · 8 min · long8v · 

[184] Improve Vision Language Model Chain-of-thought Reasoning

October 29, 2024 · 2 min · long8v · 

[183] MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models

October 24, 2024 · 2 min · long8v · 

[180] Phantom of Latent for Large Language and Vision Models

September 30, 2024 · 1 min · long8v · 

[175] Dense Reward for Free in Reinforcement Learning from Human Feedback

September 4, 2024 · 2 min · long8v · 

[171] CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs

August 30, 2024 · 2 min · long8v ·