[182] Calibrated Self-Rewarding Vision Language Models

October 10, 2024 ยท 2 min ยท long8v ยท 

[167] Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

July 24, 2024 ยท 2 min ยท long8v ยท 

[163] What You See is What You Read? Improving Text-Image Alignment Evaluation

July 18, 2024 ยท 2 min ยท long8v ยท 

[135] Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

November 23, 2023 ยท 3 min ยท long8v ยท 

[116] Data Distributional Properties Drive Emergent In-Context Learning in Transformers

May 22, 2023 ยท 3 min ยท long8v ยท 

[103] Deep Sets

March 20, 2023 ยท 3 min ยท long8v ยท 

[99] LinkNet: Relational Embedding for Scene Graph

January 18, 2023 ยท 1 min ยท long8v ยท 

[98] Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

January 17, 2023 ยท 1 min ยท long8v ยท 

[96] Vision GNN: An Image is Worth Graph of Nodes

January 5, 2023 ยท 3 min ยท long8v ยท 

[95] Pixels to Graphs by Associative Embedding

January 4, 2023 ยท 2 min ยท long8v ยท 

[93] Mining the Benefits of Two-stage and One-stage HOI Detection

December 29, 2022 ยท 1 min ยท long8v ยท 

[48] SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection

August 9, 2022 ยท 1 min ยท long8v ยท 

[25] Intriguing Properties of Vision Transformers

April 29, 2022 ยท 3 min ยท long8v ยท