[221] Scaling Synthetic Data Creation with 1,000,000,000 Personas

2026๋…„ 1์›” 19์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[194] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

2025๋…„ 1์›” 3์ผ ยท 4 ๋ถ„ ยท long8v ยท 

[191] Critique-out-Loud Reward Models

2024๋…„ 12์›” 17์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[186] The Llama 3 Herd of Models

2024๋…„ 11์›” 15์ผ ยท 6 ๋ถ„ ยท long8v ยท 

[184] Improve Vision Language Model Chain-of-thought Reasoning

2024๋…„ 10์›” 29์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[183] MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models

2024๋…„ 10์›” 24์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[180] Phantom of Latent for Large Language and Vision Models

2024๋…„ 9์›” 30์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[175] Dense Reward for Free in Reinforcement Learning from Human Feedback

2024๋…„ 9์›” 4์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[171] CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs

2024๋…„ 8์›” 30์ผ ยท 2 ๋ถ„ ยท long8v ยท