[209] SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-traininggoogle RL Berkley 2025Q1
[205] LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!Berkley reasoning 2025Q1