[214] Learning to Model the World With Language

2025λ…„ 7μ›” 17일 Β· 3 λΆ„ Β· long8v Β· 

[181] Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

2024λ…„ 10μ›” 7일 Β· 2 λΆ„ Β· long8v Β· 

[179] Aligning Large Multimodal Models with Factually Augmented RLHF

2024λ…„ 9μ›” 25일 Β· 1 λΆ„ Β· long8v Β· 

[173] Detecting and Preventing Hallucinations in Large Vision Language Models

2024λ…„ 8μ›” 30일 Β· 2 λΆ„ Β· long8v Β· 

[164] TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

2024λ…„ 7μ›” 18일 Β· 1 λΆ„ Β· long8v Β· 

[148] I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

2024λ…„ 2μ›” 11일 Β· 2 λΆ„ Β· long8v Β· 

[144] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

2023λ…„ 12μ›” 26일 Β· 2 λΆ„ Β· long8v Β· 

[140] Improved Baselines with Visual Instruction Tuning

2023λ…„ 12μ›” 12일 Β· 2 λΆ„ Β· long8v Β·