[221] Scaling Synthetic Data Creation with 1,000,000,000 Personas

2026λ…„ 1μ›” 19일 Β· 1 λΆ„ Β· long8v Β· 

[194] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

2025λ…„ 1μ›” 3일 Β· 4 λΆ„ Β· long8v Β· 

[191] Critique-out-Loud Reward Models

2024λ…„ 12μ›” 17일 Β· 2 λΆ„ Β· long8v Β· 

[186] The Llama 3 Herd of Models

2024λ…„ 11μ›” 15일 Β· 6 λΆ„ Β· long8v Β· 

[184] Improve Vision Language Model Chain-of-thought Reasoning

2024λ…„ 10μ›” 29일 Β· 2 λΆ„ Β· long8v Β· 

[183] MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models

2024λ…„ 10μ›” 24일 Β· 2 λΆ„ Β· long8v Β· 

[180] Phantom of Latent for Large Language and Vision Models

2024λ…„ 9μ›” 30일 Β· 1 λΆ„ Β· long8v Β· 

[175] Dense Reward for Free in Reinforcement Learning from Human Feedback

2024λ…„ 9μ›” 4일 Β· 2 λΆ„ Β· long8v Β· 

[171] CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs

2024λ…„ 8μ›” 30일 Β· 2 λΆ„ Β· long8v Β·