[128] Pix2Struct: Screenshot Parsing as Pretraining for Visual Language UnderstandingICML google 2022Q3 document
[98] Bridging the Gap between Object and Image-level Representations for Open-Vocabulary DetectionNeurIPS object detection 2022Q3 CLIP
[74] “This is my unicorn, Fluffy”: Personalizing frozen vision-language representationsdataset 2022Q3 25min ECCV nvidia CLIP
[54] Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language ModelsLM MoE 2022Q3 25min