[144] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyondmultilingual alibaba 2023Q3 MLLM qwen
[137] mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaborationmultimodal LLM 2023Q4 alibaba