[96] Vision GNN: An Image is Worth Graph of Nodes

2023λ…„ 1μ›” 5일 Β· 3 λΆ„ Β· long8v Β· 

[59] MLP-Mixer: An all-MLP Architecture for Vision

2022λ…„ 9μ›” 1일 Β· 1 λΆ„ Β· long8v Β· 

[58] MetaFormer Is Actually What You Need for Vision

2022λ…„ 8μ›” 31일 Β· 1 λΆ„ Β· long8v Β· 

[30] CoCa: Contrastive Captioners are Image-Text Foundation Models

2022λ…„ 6μ›” 22일 Β· 1 λΆ„ Β· long8v Β· 

[6] Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling

2022λ…„ 1μ›” 18일 Β· 1 λΆ„ Β· long8v Β· 

[5] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

2022λ…„ 1μ›” 13일 Β· 1 λΆ„ Β· long8v Β· 

[3] Twins: Revisiting the Design of Spatial Attention in Vision Transformers

2022λ…„ 1μ›” 10일 Β· 1 λΆ„ Β· long8v Β· 

[1] Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

2022λ…„ 1μ›” 5일 Β· 1 λΆ„ Β· long8v Β·