[221] Scaling Synthetic Data Creation with 1,000,000,000 Personas

2026๋…„ 1์›” 19์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[215] Group Sequence Policy Optimization

2025๋…„ 8์›” 1์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[203] DeepSeek-V3 Technical Report

2025๋…„ 2์›” 13์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[191] Critique-out-Loud Reward Models

2024๋…„ 12์›” 17์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[186] The Llama 3 Herd of Models

2024๋…„ 11์›” 15์ผ ยท 6 ๋ถ„ ยท long8v ยท 

[181] Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

2024๋…„ 10์›” 7์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[176] Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

2024๋…„ 9์›” 5์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[175] Dense Reward for Free in Reinforcement Learning from Human Feedback

2024๋…„ 9์›” 4์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[140] Improved Baselines with Visual Instruction Tuning

2023๋…„ 12์›” 12์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[137] mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

2023๋…„ 12์›” 5์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[109] ๐Ÿฆฉ Flamingo: a Visual Language Model for Few-Shot Learning

2023๋…„ 4์›” 10์ผ ยท 3 ๋ถ„ ยท long8v ยท 

[106] Prefix-Tuning: Optimizing Continuous Prompts for Generation

2023๋…„ 3์›” 28์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[105] LoRA: Low-Rank Adaptation of Large Language Models

2023๋…„ 3์›” 27์ผ ยท 2 ๋ถ„ ยท long8v ยท