[205] LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!Berkley reasoning 2025Q1
[201] VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit AssignmentRL reasoning 2025Q1
[199] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningRL reasoning 2025Q1
[196] Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human AnnotationsACL RL 2023Q4 reasoning
[194] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model ParametersDeepMind 2024Q3 reasoning
[193] Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspectivesurvey 2024Q4 reasoning