[149] Noise-aware Learning from Web-crawled Image-Text Data for Image CaptioningICCV 25min 2022Q4 kakao
[126] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervisionmultimodal 2021Q1 25min kakao
[72] Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity2021Q4 ICLR object detection sparse kakao