LLM | 🍎 Paper Today I Read 🦔

[221] Scaling Synthetic Data Creation with 1,000,000,000 Personas

dataset LLM 2024Q3

[215] Group Sequence Policy Optimization

[203] DeepSeek-V3 Technical Report

WIP 25min LLM RL 2024Q4

[191] Critique-out-Loud Reward Models

AllenAI LLM RL 2024Q3

[186] The Llama 3 Herd of Models

LLM meta 2024Q3

[181] Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

[176] Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

[175] Dense Reward for Free in Reinforcement Learning from Human Feedback

ICML LLM RL 2024Q3

[140] Improved Baselines with Visual Instruction Tuning

multimodal LLM 2023Q3 MLLM

[137] mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

multimodal LLM 2023Q4 alibaba

[109] 🦩 Flamingo: a Visual Language Model for Few-Shot Learning

multimodal DeepMind LLM

[106] Prefix-Tuning: Optimizing Continuous Prompts for Generation

2021Q1 25min finetuning LLM ACL

[105] LoRA: Low-Rank Adaptation of Large Language Models

2021Q2 microsoft finetuning LLM

[104] GPT Understands, too

2021Q1 prompt GPT finetuning LLM