[220] VideoRoPE: What Makes for Good Video Rotary Position Embedding?

November 25, 2025 ยท 2 min ยท long8v ยท 

[209] SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

May 21, 2025 ยท 2 min ยท long8v ยท 

[208] FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models

March 27, 2025 ยท 1 min ยท long8v ยท 

[206] Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

March 12, 2025 ยท 1 min ยท long8v ยท 

[207] MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

March 12, 2025 ยท 3 min ยท long8v ยท 

[205] LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

February 28, 2025 ยท 2 min ยท long8v ยท 

[204] DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL

February 19, 2025 ยท 2 min ยท long8v ยท 

[201] VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

February 8, 2025 ยท 3 min ยท long8v ยท 

[200] Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

February 3, 2025 ยท 2 min ยท long8v ยท 

[199] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

January 24, 2025 ยท 2 min ยท long8v ยท 

[198] Kimi k1.5: Scaling Reinforcement Learning with LLMs

January 23, 2025 ยท 4 min ยท long8v ยท