[162] CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

July 11, 2024 · 1 min · long8v · 

[128] Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

August 21, 2023 · 3 min · long8v · 

[98] Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

January 17, 2023 · 1 min · long8v · 

[77] Interpretable Image Classification with Differentiable Prototype Assignment

November 9, 2022 · 3 min · long8v · 

[75] SESS: Saliency Enhancing with Scaling and Sliding

November 8, 2022 · 2 min · long8v · 

[76] Long-tail Detection with Effective Class-Margins

November 8, 2022 · 2 min · long8v · 

[74] “This is my unicorn, Fluffy”: Personalizing frozen vision-language representations

November 4, 2022 · 2 min · long8v · 

[68] Iterative Scene Graph Generation

October 5, 2022 · 2 min · long8v · 

[60] Efficient Sparsely Activated Transformers

September 2, 2022 · 1 min · long8v · 

[54] Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

August 25, 2022 · 2 min · long8v ·