2024Q3 | 🍎 Paper Today I Read 🦔

[221] Scaling Synthetic Data Creation with 1,000,000,000 Personas

dataset LLM 2024Q3

[194] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

DeepMind 2024Q3 reasoning

[191] Critique-out-Loud Reward Models

AllenAI LLM RL 2024Q3

[186] The Llama 3 Herd of Models

LLM meta 2024Q3

[184] Improve Vision Language Model Chain-of-thought Reasoning

CMU MLLM 2024Q3

[183] MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models

MLLM 2024Q3 STEM

[180] Phantom of Latent for Large Language and Vision Models

[175] Dense Reward for Free in Reinforcement Learning from Human Feedback

ICML LLM RL 2024Q3

[171] CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs

ECCV RL MLLM 2024Q3