[149] Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

2024λ…„ 2μ›” 12일 Β· 1 λΆ„ Β· long8v Β· 

[143] Honeybee: Locality-enhanced Projector for Multimodal LLM

2023λ…„ 12μ›” 22일 Β· 3 λΆ„ Β· long8v Β· 

[126] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

2023λ…„ 8μ›” 9일 Β· 1 λΆ„ Β· long8v Β· 

[72] Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

2022λ…„ 10μ›” 20일 Β· 1 λΆ„ Β· long8v Β·