[182] Calibrated Self-Rewarding Vision Language Models

2024๋…„ 10์›” 10์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[167] Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

2024๋…„ 7์›” 24์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[163] What You See is What You Read? Improving Text-Image Alignment Evaluation

2024๋…„ 7์›” 18์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[135] Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

2023๋…„ 11์›” 23์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[119] Visual Instruction Tuning

2023๋…„ 6์›” 9์ผ ยท 3 ๋ถ„ ยท long8v ยท 

[116] Data Distributional Properties Drive Emergent In-Context Learning in Transformers

2023๋…„ 5์›” 22์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[103] Deep Sets

2023๋…„ 3์›” 20์ผ ยท 3 ๋ถ„ ยท long8v ยท 

[99] LinkNet: Relational Embedding for Scene Graph

2023๋…„ 1์›” 18์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[98] Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

2023๋…„ 1์›” 17์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[97] Contrastive Language-Image Pre-Training with Knowledge Graph

2023๋…„ 1์›” 12์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[96] Vision GNN: An Image is Worth Graph of Nodes

2023๋…„ 1์›” 5์ผ ยท 3 ๋ถ„ ยท long8v ยท 

[95] Pixels to Graphs by Associative Embedding

2023๋…„ 1์›” 4์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[94] Recipe for a General, Powerful, Scalable Graph Transformer

2023๋…„ 1์›” 3์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[93] Mining the Benefits of Two-stage and One-stage HOI Detection

2022๋…„ 12์›” 29์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[48] SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection

2022๋…„ 8์›” 9์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[25] Intriguing Properties of Vision Transformers

2022๋…„ 4์›” 29์ผ ยท 2 ๋ถ„ ยท long8v ยท