[209] SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

2025λ…„ 5μ›” 21일 Β· 2 λΆ„ Β· long8v Β· 

[205] LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

2025λ…„ 2μ›” 28일 Β· 2 λΆ„ Β· long8v Β· 

[179] Aligning Large Multimodal Models with Factually Augmented RLHF

2024λ…„ 9μ›” 25일 Β· 1 λΆ„ Β· long8v Β·