[203] DeepSeek-V3 Technical Report

February 13, 2025 · 2 min · long8v · 

[197] Free Process Rewards without Process Labels

January 20, 2025 · 1 min · long8v · 

[193] Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

December 30, 2024 · 3 min · long8v · 

[188] LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

December 2, 2024 · 2 min · long8v · 

[187] Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

November 21, 2024 · 2 min · long8v · 

[185] LLaVA-OneVision: Easy Visual Task Transfer

November 12, 2024 · 1 min · long8v ·