[203] DeepSeek-V3 Technical Report

2025λ…„ 2μ›” 13일 Β· 2 λΆ„ Β· long8v Β· 

[197] Free Process Rewards without Process Labels

2025λ…„ 1μ›” 20일 Β· 1 λΆ„ Β· long8v Β· 

[193] Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

2024λ…„ 12μ›” 30일 Β· 3 λΆ„ Β· long8v Β· 

[188] LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

2024λ…„ 12μ›” 2일 Β· 1 λΆ„ Β· long8v Β· 

[187] Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

2024λ…„ 11μ›” 21일 Β· 2 λΆ„ Β· long8v Β· 

[185] LLaVA-OneVision: Easy Visual Task Transfer

2024λ…„ 11μ›” 12일 Β· 1 λΆ„ Β· long8v Β·