[214] Learning to Model the World With Language

July 17, 2025 · 4 min · long8v · 

[181] Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

October 7, 2024 · 2 min · long8v · 

[179] Aligning Large Multimodal Models with Factually Augmented RLHF

September 25, 2024 · 2 min · long8v · 

[173] Detecting and Preventing Hallucinations in Large Vision Language Models

August 30, 2024 · 2 min · long8v · 

[164] TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

July 18, 2024 · 1 min · long8v · 

[148] I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

February 11, 2024 · 2 min · long8v · 

[144] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

December 26, 2023 · 2 min · long8v · 

[140] Improved Baselines with Visual Instruction Tuning

December 12, 2023 · 3 min · long8v ·