[145] CLIPScore: A Reference-free Evaluation Metric for Image Captioning

2024๋…„ 2์›” 5์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[112] RoFormer: Enhanced Transformer with Rotary Position Embedding

2023๋…„ 4์›” 26์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[111] Perceiver IO: A General Architecture for Structured Inputs & Outputs

2023๋…„ 4์›” 24์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[105] LoRA: Low-Rank Adaptation of Large Language Models

2023๋…„ 3์›” 27์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[93] Mining the Benefits of Two-stage and One-stage HOI Detection

2022๋…„ 12์›” 29์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[85] Dynamic Head: Unifying Object Detection Heads with Attentions

2022๋…„ 12์›” 1์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[59] MLP-Mixer: An all-MLP Architecture for Vision

2022๋…„ 9์›” 1์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[38] Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries

2022๋…„ 7์›” 22์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[32] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

2022๋…„ 6์›” 28์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[24] DINO: Emerging Properties in Self-Supervised Vision Transformers

2022๋…„ 4์›” 26์ผ ยท 4 ๋ถ„ ยท long8v ยท 

[8] SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

2022๋…„ 1์›” 24์ผ ยท 1 ๋ถ„ ยท long8v ยท