[29] Grounded Language-Image Pre-trainingmultimodal 2021Q4 few-shot zero-shot microsoft object detection
[8] SimVLM: Simple Visual Language Model Pretraining with Weak Supervisionmultimodal SSL 2021Q2 zero-shot