[221] Scaling Synthetic Data Creation with 1,000,000,000 Personas

January 19, 2026 · 1 min · long8v · 

[151] FOIL it! Find One mismatch between Image and Language caption

March 3, 2024 · 3 min · long8v · 

[138] ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

December 8, 2023 · 2 min · long8v · 

[135] Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

November 23, 2023 · 3 min · long8v · 

[133] DataComp: In search of the next generation of multimodal datasets

October 5, 2023 · 2 min · long8v · 

[108] Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships

April 4, 2023 · 4 min · long8v · 

[74] “This is my unicorn, Fluffy”: Personalizing frozen vision-language representations

November 4, 2022 · 2 min · long8v · 

[19] Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

April 6, 2022 · 1 min · long8v ·