[221] Scaling Synthetic Data Creation with 1,000,000,000 Personas

January 19, 2026 ยท 1 min ยท long8v ยท 

[215] Group Sequence Policy Optimization

August 1, 2025 ยท 3 min ยท long8v ยท 

[203] DeepSeek-V3 Technical Report

February 13, 2025 ยท 2 min ยท long8v ยท 

[191] Critique-out-Loud Reward Models

December 17, 2024 ยท 2 min ยท long8v ยท 

[186] The Llama 3 Herd of Models

November 15, 2024 ยท 8 min ยท long8v ยท 

[181] Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

October 7, 2024 ยท 2 min ยท long8v ยท 

[176] Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

September 5, 2024 ยท 2 min ยท long8v ยท 

[175] Dense Reward for Free in Reinforcement Learning from Human Feedback

September 4, 2024 ยท 2 min ยท long8v ยท 

[140] Improved Baselines with Visual Instruction Tuning

December 12, 2023 ยท 3 min ยท long8v ยท 

[137] mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

December 5, 2023 ยท 3 min ยท long8v ยท 

[109] ๐Ÿฆฉ Flamingo: a Visual Language Model for Few-Shot Learning

April 10, 2023 ยท 4 min ยท long8v ยท 

[106] Prefix-Tuning: Optimizing Continuous Prompts for Generation

March 28, 2023 ยท 1 min ยท long8v ยท 

[105] LoRA: Low-Rank Adaptation of Large Language Models

March 27, 2023 ยท 2 min ยท long8v ยท