2023Q3 | 🍎 Paper Today I Read 🦔

[214] Learning to Model the World With Language

ICML RL 2023Q3 WORLD-MODEL

[181] Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

[179] Aligning Large Multimodal Models with Factually Augmented RLHF

25min RL 2023Q3 MLLM Berkley

[173] Detecting and Preventing Hallucinations in Large Vision Language Models

AAAI RL 2023Q3 MLLM ScaleAI

[164] TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

ICCV evaluation 2023Q3

[148] I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

ICCV 25min CLIP 2023Q3 AI2

[144] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

multilingual alibaba 2023Q3 MLLM qwen

[140] Improved Baselines with Visual Instruction Tuning

multimodal LLM 2023Q3 MLLM