Image

paper

TL;DR

  • I read this because.. : github 돌아다니다가 curriculum이라서
  • task : reasoning LLM
  • problem : curriculum 학습 하고 싶다 (deepscaleR과 비슷)
  • idea : prompt가 길면 더 복잡할 것이다
  • architecture :
  • objective : GRPO loss
  • baseline : DEEPSEEK-R1DISTILL-QWEN-1.5B, STILL-1.5B7, DeepScaleR1.5B-Preview, RSTAR-MATH-7B , QWEN-2.5-MATH-7B-Instruct, QWEN2.5-7B-SimpleRL8, and EURUS-27B-PRIM
  • data : AIME problems (1984-2023), AMC problems (before 2023), Omni-MATH dataset, Still dataset
  • evaluation : MATH 500, AIME 2024, AMC 2023, Minerva Math, and OlympiadBench
  • result : baseline대비 좋은 성능. training cost가 deepscaleR의 50%
  • contribution :
  • etc. :

Details

Image Image Image Image