[220] VideoRoPE: What Makes for Good Video Rotary Position Embedding?

2025๋…„ 11์›” 25์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[209] SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

2025๋…„ 5์›” 21์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[208] FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models

2025๋…„ 3์›” 27์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[206] Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

2025๋…„ 3์›” 12์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[207] MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

2025๋…„ 3์›” 12์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[205] LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

2025๋…„ 2์›” 28์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[204] DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL

2025๋…„ 2์›” 19์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[201] VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

2025๋…„ 2์›” 8์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[200] Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

2025๋…„ 2์›” 3์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[199] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

2025๋…„ 1์›” 24์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[198] Kimi k1.5: Scaling Reinforcement Learning with LLMs

2025๋…„ 1์›” 23์ผ ยท 3 ๋ถ„ ยท long8v ยท