[213] Skywork-R1V3 Technical Report

July 11, 2025 ยท 3 min ยท long8v ยท 

[211] Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

July 2, 2025 ยท 2 min ยท long8v ยท 

[212] MiMo-VL Technical Report

July 2, 2025 ยท 3 min ยท long8v ยท 

[210] Weight Ensembling Improves Reasoning in Language Models

May 30, 2025 ยท 2 min ยท long8v ยท 

[205] LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

February 28, 2025 ยท 2 min ยท long8v ยท 

[204] DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL

February 19, 2025 ยท 2 min ยท long8v ยท 

[201] VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

February 8, 2025 ยท 3 min ยท long8v ยท 

[199] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

January 24, 2025 ยท 2 min ยท long8v ยท 

[198] Kimi k1.5: Scaling Reinforcement Learning with LLMs

January 23, 2025 ยท 4 min ยท long8v ยท 

[196] Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

January 17, 2025 ยท 2 min ยท long8v ยท 

[195] STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning

January 9, 2025 ยท 1 min ยท long8v ยท 

[194] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

January 3, 2025 ยท 4 min ยท long8v ยท 

[193] Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

December 30, 2024 ยท 3 min ยท long8v ยท