[121] Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entitiesmultimodal CLIP 2023Q1 retrieval
[113] BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Modelsmultimodal 2023Q1 salesforce