CLIP | 🍎 Paper Today I Read 🦔

[162] CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

AAAI 2022Q3 25min CLIP

[159] Long-CLIP: Unlocking the Long-Text Capability of CLIP

25min CLIP 2024Q1

[156] Interpreting CLIP's Image Representation via Text-Based Decomposition

ICLR CLIP XAI 2023Q4

[157] LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

CLIP XAI 2024Q2

[152] Sigmoid Loss for Language Image Pre-Training

25min CLIP 2023Q1

[148] I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

ICCV 25min CLIP 2023Q3 AI2

[145] CLIPScore: A Reference-free Evaluation Metric for Image Captioning

2021Q2 CLIP emnlp evaluation AI2

[141] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

multimodal CLIP 2023Q4

[133] DataComp: In search of the next generation of multimodal datasets

dataset CLIP 2023Q2

[132] Hyperbolic Image-Text Representations

ICML CLIP 2023Q2 meta

[125] RILS: Masked Visual Reconstruction in Language Semantic Space

CVPR CLIP 2023Q1

[124] LiT: Zero-Shot Transfer with Locked-image text Tuning

2021Q4 google CLIP

[123] Robust fine-tuning of zero-shot models

openAI google CVPR 2022Q3 CLIP domainshift

[121] Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

multimodal CLIP 2023Q1 retrieval

feat: add open-clip

[120] Large-scale Bilingual Language-Image Contrastive Learning

2022Q1 CLIP multilingual

[98] Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

NeurIPS object detection 2022Q3 CLIP

[97] Contrastive Language-Image Pre-Training with Knowledge Graph

multimodal NeurIPS graph 2022Q4 CLIP

[74] “This is my unicorn, Fluffy”: Personalizing frozen vision-language representations

dataset 2022Q3 25min ECCV nvidia CLIP

[10] CLIP: Connecting Text and Images

multimodal 2021Q1 few-shot SSL zero-shot CLIP