[169] Direct Preference Optimization: Your Language Model is Secretly a Reward Model

2024λ…„ 8μ›” 26일 Β· 1 λΆ„ Β· long8v Β· 

[163] What You See is What You Read? Improving Text-Image Alignment Evaluation

2024λ…„ 7μ›” 18일 Β· 1 λΆ„ Β· long8v Β· 

[135] Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

2023λ…„ 11μ›” 23일 Β· 2 λΆ„ Β· long8v Β· 

[133] DataComp: In search of the next generation of multimodal datasets

2023λ…„ 10μ›” 5일 Β· 2 λΆ„ Β· long8v Β· 

[132] Hyperbolic Image-Text Representations

2023λ…„ 9μ›” 26일 Β· 2 λΆ„ Β· long8v Β· 

[130] Segment Anything

2023λ…„ 9μ›” 4일 Β· 1 λΆ„ Β· long8v Β· 

[119] Visual Instruction Tuning

2023λ…„ 6μ›” 9일 Β· 3 λΆ„ Β· long8v Β· 

[118] PaLI-X: On Scaling up a Multilingual Vision and Language Model

2023λ…„ 6μ›” 8일 Β· 3 λΆ„ Β· long8v Β· 

[115] ImageBind: One Embedding Space To Bind Them All

2023λ…„ 5μ›” 16일 Β· 1 λΆ„ Β· long8v Β·