[157] LeGrad: An Explainability Method for Vision Transformers via Feature Formation SensitivityCLIP XAI 2024Q2
[148] I Can't Believe There's No Images! Learning Visual Tasks Using only Language SupervisionICCV 25min CLIP 2023Q3 AI2
[145] CLIPScore: A Reference-free Evaluation Metric for Image Captioning2021Q2 CLIP emnlp evaluation AI2
[121] Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entitiesmultimodal CLIP 2023Q1 retrieval
[98] Bridging the Gap between Object and Image-level Representations for Open-Vocabulary DetectionNeurIPS object detection 2022Q3 CLIP
[97] Contrastive Language-Image Pre-Training with Knowledge Graphmultimodal NeurIPS graph 2022Q4 CLIP
[74] โThis is my unicorn, Fluffyโ: Personalizing frozen vision-language representationsdataset 2022Q3 25min ECCV nvidia CLIP