[169] Direct Preference Optimization: Your Language Model is Secretly a Reward Model

2024๋…„ 8์›” 26์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[163] What You See is What You Read? Improving Text-Image Alignment Evaluation

2024๋…„ 7์›” 18์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[135] Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

2023๋…„ 11์›” 23์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[133] DataComp: In search of the next generation of multimodal datasets

2023๋…„ 10์›” 5์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[132] Hyperbolic Image-Text Representations

2023๋…„ 9์›” 26์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[130] Segment Anything

2023๋…„ 9์›” 4์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[119] Visual Instruction Tuning

2023๋…„ 6์›” 9์ผ ยท 3 ๋ถ„ ยท long8v ยท 

[118] PaLI-X: On Scaling up a Multilingual Vision and Language Model

2023๋…„ 6์›” 8์ผ ยท 3 ๋ถ„ ยท long8v ยท 

[115] ImageBind: One Embedding Space To Bind Them All

2023๋…„ 5์›” 16์ผ ยท 1 ๋ถ„ ยท long8v ยท