[213] Skywork-R1V3 Technical Report

2025๋…„ 7์›” 11์ผ ยท 3 ๋ถ„ ยท long8v ยท 

[211] Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

2025๋…„ 7์›” 2์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[212] MiMo-VL Technical Report

2025๋…„ 7์›” 2์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[210] Weight Ensembling Improves Reasoning in Language Models

2025๋…„ 5์›” 30์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[205] LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

2025๋…„ 2์›” 28์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[204] DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL

2025๋…„ 2์›” 19์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[201] VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

2025๋…„ 2์›” 8์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[199] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

2025๋…„ 1์›” 24์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[198] Kimi k1.5: Scaling Reinforcement Learning with LLMs

2025๋…„ 1์›” 23์ผ ยท 3 ๋ถ„ ยท long8v ยท 

[196] Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

2025๋…„ 1์›” 17์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[195] STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning

2025๋…„ 1์›” 9์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[194] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

2025๋…„ 1์›” 3์ผ ยท 4 ๋ถ„ ยท long8v ยท 

[193] Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

2024๋…„ 12์›” 30์ผ ยท 3 ๋ถ„ ยท long8v ยท 

[192] Scaling Test-time Compute with Open Models (hf blog)

2024๋…„ 12์›” 23์ผ ยท 5 ๋ถ„ ยท long8v ยท