[145] CLIPScore: A Reference-free Evaluation Metric for Image Captioning

February 5, 2024 · 3 min · long8v · 

[112] RoFormer: Enhanced Transformer with Rotary Position Embedding

April 26, 2023 · 2 min · long8v · 

[93] Mining the Benefits of Two-stage and One-stage HOI Detection

December 29, 2022 · 1 min · long8v · 

[38] Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries

July 22, 2022 · 2 min · long8v · 

[32] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

June 28, 2022 · 2 min · long8v · 

[24] DINO: Emerging Properties in Self-Supervised Vision Transformers

April 26, 2022 · 5 min · long8v · 

[8] SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

January 24, 2022 · 1 min · long8v ·