[147] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder TransformersICCV 2021Q1 XAI
[131] Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels2021Q1 CVPR naver
[126] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervisionmultimodal 2021Q1 25min kakao
[82] Estimating and Evaluating Regression Predictive Uncertainty in Deep Object Detectors2021Q1 ICLR object detection uncertainty later..
[5] An Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleViT backbone 2021Q1 re-read