[221] Scaling Synthetic Data Creation with 1,000,000,000 Personas

2026๋…„ 1์›” 19์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[151] FOIL it! Find One mismatch between Image and Language caption

2024๋…„ 3์›” 3์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[138] ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

2023๋…„ 12์›” 8์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[135] Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

2023๋…„ 11์›” 23์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[133] DataComp: In search of the next generation of multimodal datasets

2023๋…„ 10์›” 5์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[108] Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships

2023๋…„ 4์›” 4์ผ ยท 3 ๋ถ„ ยท long8v ยท 

[74] โ€œThis is my unicorn, Fluffyโ€: Personalizing frozen vision-language representations

2022๋…„ 11์›” 4์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[41] Panoptic Scene Graph Generation

2022๋…„ 8์›” 1์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[19] Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

2022๋…„ 4์›” 6์ผ ยท 1 ๋ถ„ ยท long8v ยท