[7] SLIP: Self-supervision meets Language-Image Pre-training

January 20, 2022 · 1 min · long8v · | Translations:

problem : Experiment to see if self-supervised-learning works well with CLIP model structure solution : Use contrastive learning for images and text, and self-supervision learning for images to combine the losses. result : linear prediction (=classification by attaching FCN to a representation. Only linear is learned. ‘feature-based learning’ as mentioned in BERT), zero-shot learning, end-to-end finetuning to SOTA