[153] Contrastive Explanations for Model Interpretability

2024๋…„ 4์›” 1์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[147] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

2024๋…„ 2์›” 7์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[131] Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

2023๋…„ 9์›” 13์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[126] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

2023๋…„ 8์›” 9์ผ ยท 1 ๋ถ„ ยท long8v ยท 

feat: add open-clip

2023๋…„ 6์›” 21์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[106] Prefix-Tuning: Optimizing Continuous Prompts for Generation

2023๋…„ 3์›” 28์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[11] DALL-E : Zero-Shot Text-to-Image Generation

2022๋…„ 2์›” 7์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[5] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

2022๋…„ 1์›” 13์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[4] Conditional Positional Encodings for Vision Transformers

2022๋…„ 1์›” 12์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[3] Twins: Revisiting the Design of Spatial Attention in Vision Transformers

2022๋…„ 1์›” 10์ผ ยท 1 ๋ถ„ ยท long8v ยท