[222] Qwen3-VL Technical Report

March 9, 2026 ยท 6 min ยท long8v ยท 

[220] VideoRoPE: What Makes for Good Video Rotary Position Embedding?

November 25, 2025 ยท 2 min ยท long8v ยท 

[219] GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

November 12, 2025 ยท 4 min ยท long8v ยท 

[218] Qwen2.5-VL Technical Report

November 10, 2025 ยท 4 min ยท long8v ยท 

[216] Emerging Properties in Unified Multimodal Pretraining

September 4, 2025 ยท 4 min ยท long8v ยท 

[213] Skywork-R1V3 Technical Report

July 11, 2025 ยท 3 min ยท long8v ยท 

[211] Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

July 2, 2025 ยท 2 min ยท long8v ยท 

[212] MiMo-VL Technical Report

July 2, 2025 ยท 3 min ยท long8v ยท 

[206] Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

March 12, 2025 ยท 1 min ยท long8v ยท 

[207] MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

March 12, 2025 ยท 3 min ยท long8v ยท 

[188] LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

December 2, 2024 ยท 2 min ยท long8v ยท 

[187] Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

November 21, 2024 ยท 2 min ยท long8v ยท 

[185] LLaVA-OneVision: Easy Visual Task Transfer

November 12, 2024 ยท 1 min ยท long8v ยท 

[184] Improve Vision Language Model Chain-of-thought Reasoning

October 29, 2024 ยท 2 min ยท long8v ยท 

[183] MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models

October 24, 2024 ยท 2 min ยท long8v ยท 

[182] Calibrated Self-Rewarding Vision Language Models

October 10, 2024 ยท 2 min ยท long8v ยท 

[180] Phantom of Latent for Large Language and Vision Models

September 30, 2024 ยท 1 min ยท long8v ยท 

[179] Aligning Large Multimodal Models with Factually Augmented RLHF

September 25, 2024 ยท 2 min ยท long8v ยท 

[178] RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

September 23, 2024 ยท 3 min ยท long8v ยท 

[174] Evaluations for Object Hallucinations

September 2, 2024 ยท 2 min ยท long8v ยท 

[171] CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs

August 30, 2024 ยท 2 min ยท long8v ยท 

[172] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

August 30, 2024 ยท 2 min ยท long8v ยท 

[173] Detecting and Preventing Hallucinations in Large Vision Language Models

August 30, 2024 ยท 2 min ยท long8v ยท 

[166] Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

July 23, 2024 ยท 2 min ยท long8v ยท 

[144] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

December 26, 2023 ยท 2 min ยท long8v ยท 

[143] Honeybee: Locality-enhanced Projector for Multimodal LLM

December 22, 2023 ยท 3 min ยท long8v ยท 

[140] Improved Baselines with Visual Instruction Tuning

December 12, 2023 ยท 3 min ยท long8v ยท 

[138] ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

December 8, 2023 ยท 2 min ยท long8v ยท