[138] ShareGPT4V: Improving Large Multi-Modal Models with Better Captionsmultimodal dataset 2023Q4 MLLM
[135] Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Textmultimodal dataset NeurIPS 2023Q2
[108] Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships2022Q1 dataset CVPR graph
[74] “This is my unicorn, Fluffy”: Personalizing frozen vision-language representationsdataset 2022Q3 25min ECCV nvidia CLIP
[19] Multimodal Explanations: Justifying Decisions and Pointing to the Evidencemultimodal 2018 dataset