[208] FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models

TL;DR

I read this because.. : I was wandering around github and saw curriculum.
task : reasoning LLM
problem : I want to learn curriculum (similar to deepscaleR)
idea : longer prompt would be more complicated
architecture :
objective : GRPO loss
baseline : DEEPSEEK-R1DISTILL-QWEN-1.5B, STILL-1.5B7, DeepScaleR1.5B-Preview, RSTAR-MATH-7B , QWEN-2.5-MATH-7B-Instruct, QWEN2.5-7B-SimpleRL8, and EURUS-27B-PRIM
data : AIME problems (1984-2023), AMC problems (before 2023), Omni-MATH dataset, Still dataset
evaluation : MATH 500, AIME 2024, AMC 2023, Minerva Math, and OlympiadBench
result : Good performance compared to baseline. training cost is 50% of deepscaleR
contribution :
etc. :