2024Q4 | 🍎 Paper Today I Read 🦔

[203] DeepSeek-V3 Technical Report

WIP 25min LLM RL 2024Q4

[197] Free Process Rewards without Process Labels

25min RL 2024Q4

[193] Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

survey 2024Q4 reasoning

[192] Scaling Test-time Compute with Open Models (hf blog)

2024Q4 test-time-scaling reasoning

[188] LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

[187] Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

RL MLLM 2024Q4 SHU

[185] LLaVA-OneVision: Easy Visual Task Transfer

25min MLLM 2024Q4