[209] SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-traininggoogle RL Berkley 2025Q1
[208] FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models25min RL 2025Q1
[206] Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models25min RL MLLM 2025Q1
[207] MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement LearningRL MLLM 2025Q1
[205] LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!Berkley reasoning 2025Q1
[201] VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit AssignmentRL reasoning 2025Q1
[200] Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling25min RL 2025Q1 THU
[199] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningRL reasoning 2025Q1