[222] Qwen3-VL Technical Report

March 9, 2026 ยท 6 min ยท long8v ยท 

[221] Scaling Synthetic Data Creation with 1,000,000,000 Personas

January 19, 2026 ยท 1 min ยท long8v ยท 

[220] VideoRoPE: What Makes for Good Video Rotary Position Embedding?

November 25, 2025 ยท 2 min ยท long8v ยท 

[219] GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

November 12, 2025 ยท 4 min ยท long8v ยท 

[218] Qwen2.5-VL Technical Report

November 10, 2025 ยท 4 min ยท long8v ยท 

[217] PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

November 3, 2025 ยท 3 min ยท long8v ยท 

[216] Emerging Properties in Unified Multimodal Pretraining

September 4, 2025 ยท 4 min ยท long8v ยท 

[215] Group Sequence Policy Optimization

August 1, 2025 ยท 3 min ยท long8v ยท 

[214] Learning to Model the World With Language

July 17, 2025 ยท 4 min ยท long8v ยท 

[213] Skywork-R1V3 Technical Report

July 11, 2025 ยท 3 min ยท long8v ยท 

[211] Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

July 2, 2025 ยท 2 min ยท long8v ยท 

[212] MiMo-VL Technical Report

July 2, 2025 ยท 3 min ยท long8v ยท 

[210] Weight Ensembling Improves Reasoning in Language Models

May 30, 2025 ยท 2 min ยท long8v ยท 

[209] SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

May 21, 2025 ยท 2 min ยท long8v ยท 

[208] FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models

March 27, 2025 ยท 1 min ยท long8v ยท 

[206] Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

March 12, 2025 ยท 1 min ยท long8v ยท 

[207] MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

March 12, 2025 ยท 3 min ยท long8v ยท 

[205] LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

February 28, 2025 ยท 2 min ยท long8v ยท 

[204] DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL

February 19, 2025 ยท 2 min ยท long8v ยท 

[203] DeepSeek-V3 Technical Report

February 13, 2025 ยท 2 min ยท long8v ยท 

[201] VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

February 8, 2025 ยท 3 min ยท long8v ยท 

[200] Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

February 3, 2025 ยท 2 min ยท long8v ยท 

[199] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

January 24, 2025 ยท 2 min ยท long8v ยท 

[198] Kimi k1.5: Scaling Reinforcement Learning with LLMs

January 23, 2025 ยท 4 min ยท long8v ยท 

[197] Free Process Rewards without Process Labels

January 20, 2025 ยท 1 min ยท long8v ยท 

[196] Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

January 17, 2025 ยท 2 min ยท long8v ยท 

[195] STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning

January 9, 2025 ยท 1 min ยท long8v ยท 

[194] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

January 3, 2025 ยท 4 min ยท long8v ยท 

[193] Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

December 30, 2024 ยท 3 min ยท long8v ยท 

[191] Critique-out-Loud Reward Models

December 17, 2024 ยท 2 min ยท long8v ยท 

[190] Solving math word problems with process and outcome-based feedback

December 16, 2024 ยท 4 min ยท long8v ยท 

[189] Training Verifiers to Solve Math Word Problems

December 9, 2024 ยท 1 min ยท long8v ยท 

read torch titan

December 4, 2024 ยท 0 min ยท long8v ยท 

[188] LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

December 2, 2024 ยท 2 min ยท long8v ยท 

[187] Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

November 21, 2024 ยท 2 min ยท long8v ยท 

[186] The Llama 3 Herd of Models

November 15, 2024 ยท 8 min ยท long8v ยท 

[185] LLaVA-OneVision: Easy Visual Task Transfer

November 12, 2024 ยท 1 min ยท long8v ยท 

[184] Improve Vision Language Model Chain-of-thought Reasoning

October 29, 2024 ยท 2 min ยท long8v ยท 

[183] MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models

October 24, 2024 ยท 2 min ยท long8v ยท 

[182] Calibrated Self-Rewarding Vision Language Models

October 10, 2024 ยท 2 min ยท long8v ยท 

[181] Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

October 7, 2024 ยท 2 min ยท long8v ยท 

[180] Phantom of Latent for Large Language and Vision Models

September 30, 2024 ยท 1 min ยท long8v ยท 

[179] Aligning Large Multimodal Models with Factually Augmented RLHF

September 25, 2024 ยท 2 min ยท long8v ยท 

[178] RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

September 23, 2024 ยท 3 min ยท long8v ยท 

[177] Fine-grained Image Captioning with CLIP Reward

September 6, 2024 ยท 2 min ยท long8v ยท 

[176] Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

September 5, 2024 ยท 2 min ยท long8v ยท 

[175] Dense Reward for Free in Reinforcement Learning from Human Feedback

September 4, 2024 ยท 2 min ยท long8v ยท 

[174] Evaluations for Object Hallucinations

September 2, 2024 ยท 2 min ยท long8v ยท 

[171] CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs

August 30, 2024 ยท 2 min ยท long8v ยท 

[172] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

August 30, 2024 ยท 2 min ยท long8v ยท 

[173] Detecting and Preventing Hallucinations in Large Vision Language Models

August 30, 2024 ยท 2 min ยท long8v ยท 

[170] Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

August 27, 2024 ยท 2 min ยท long8v ยท 

[169] Direct Preference Optimization: Your Language Model is Secretly a Reward Model

August 26, 2024 ยท 1 min ยท long8v ยท 

[168] Proximal Policy Optimization Algorithms

August 21, 2024 ยท 2 min ยท long8v ยท 

[167] Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

July 24, 2024 ยท 2 min ยท long8v ยท 

[166] Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

July 23, 2024 ยท 2 min ยท long8v ยท 

[165] Rich Human Feedback for Text-to-Image Generation

July 19, 2024 ยท 2 min ยท long8v ยท 

[163] What You See is What You Read? Improving Text-Image Alignment Evaluation

July 18, 2024 ยท 2 min ยท long8v ยท 

[164] TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

July 18, 2024 ยท 1 min ยท long8v ยท 

[162] CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

July 11, 2024 ยท 1 min ยท long8v ยท 

[161] MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks

July 9, 2024 ยท 1 min ยท long8v ยท 

[160] ALOHa: A New Measure for Hallucination in Captioning Models

June 15, 2024 ยท 2 min ยท long8v ยท 

[159] Long-CLIP: Unlocking the Long-Text Capability of CLIP

May 10, 2024 ยท 1 min ยท long8v ยท 

[158] A Mathematical Framework for Transformer Circuits

May 9, 2024 ยท 4 min ยท long8v ยท 

feat: add text span

May 7, 2024 ยท 1 min ยท long8v ยท 

[156] Interpreting CLIP's Image Representation via Text-Based Decomposition

May 6, 2024 ยท 2 min ยท long8v ยท 

[157] LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

May 6, 2024 ยท 2 min ยท long8v ยท 

feat: add LeGrad

May 6, 2024 ยท 1 min ยท long8v ยท 

[155] Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

May 3, 2024 ยท 2 min ยท long8v ยท 

feat: llava next hf implementation

April 23, 2024 ยท 1 min ยท long8v ยท 

[154] Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

April 3, 2024 ยท 2 min ยท long8v ยท 

[153] Contrastive Explanations for Model Interpretability

April 1, 2024 ยท 2 min ยท long8v ยท 

[152] Sigmoid Loss for Language Image Pre-Training

March 12, 2024 ยท 2 min ยท long8v ยท 

[151] FOIL it! Find One mismatch between Image and Language caption

March 3, 2024 ยท 3 min ยท long8v ยท 

[150] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

February 13, 2024 ยท 3 min ยท long8v ยท 

[149] Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

February 12, 2024 ยท 1 min ยท long8v ยท 

[148] I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

February 11, 2024 ยท 2 min ยท long8v ยท 

[147] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

February 7, 2024 ยท 3 min ยท long8v ยท 

[146] Transformer Interpretability Beyond Attention Visualization

February 6, 2024 ยท 4 min ยท long8v ยท 

[145] CLIPScore: A Reference-free Evaluation Metric for Image Captioning

February 5, 2024 ยท 3 min ยท long8v ยท 

[144] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

December 26, 2023 ยท 2 min ยท long8v ยท 

[143] Honeybee: Locality-enhanced Projector for Multimodal LLM

December 22, 2023 ยท 3 min ยท long8v ยท 

[142] Trust Region Policy Optimization

December 17, 2023 ยท 1 min ยท long8v ยท 

[141] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

December 15, 2023 ยท 3 min ยท long8v ยท 

[140] Improved Baselines with Visual Instruction Tuning

December 12, 2023 ยท 3 min ยท long8v ยท 

[139] Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation

December 11, 2023 ยท 2 min ยท long8v ยท 

[138] ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

December 8, 2023 ยท 2 min ยท long8v ยท 

[137] mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

December 5, 2023 ยท 3 min ยท long8v ยท 

[136] Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models

November 28, 2023 ยท 3 min ยท long8v ยท 

[135] Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

November 23, 2023 ยท 3 min ยท long8v ยท 

[134] Asynchronous Methods for Deep Reinforcement Learning

October 18, 2023 ยท 4 min ยท long8v ยท 

[133] DataComp: In search of the next generation of multimodal datasets

October 5, 2023 ยท 2 min ยท long8v ยท 

[132] Hyperbolic Image-Text Representations

September 26, 2023 ยท 2 min ยท long8v ยท 

[131] Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

September 13, 2023 ยท 2 min ยท long8v ยท 

[129] Grounding Language Models to Images for Multimodal Inputs and Outputs

September 4, 2023 ยท 1 min ยท long8v ยท 

[128] Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

August 21, 2023 ยท 3 min ยท long8v ยท 

[127] Linearly Mapping from Image to Text Space

August 17, 2023 ยท 2 min ยท long8v ยท 

[126] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

August 9, 2023 ยท 2 min ยท long8v ยท 

[125] RILS: Masked Visual Reconstruction in Language Semantic Space

August 2, 2023 ยท 2 min ยท long8v ยท 

feat: add sparse rcnn

July 24, 2023 ยท 1 min ยท long8v ยท 

[124] LiT: Zero-Shot Transfer with Locked-image text Tuning

July 6, 2023 ยท 4 min ยท long8v ยท 

[122] Neural Architecture Search without Training

June 28, 2023 ยท 2 min ยท long8v ยท 

[121] Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

June 23, 2023 ยท 3 min ยท long8v ยท 

feat: add open-clip

June 21, 2023 ยท 1 min ยท long8v ยท 

[120] Large-scale Bilingual Language-Image Contrastive Learning

June 19, 2023 ยท 3 min ยท long8v ยท 

[118] PaLI-X: On Scaling up a Multilingual Vision and Language Model

June 8, 2023 ยท 4 min ยท long8v ยท 

[117] Multimodal Chain-of-Thought Reasoning in Language Models

June 7, 2023 ยท 2 min ยท long8v ยท 

[116] Data Distributional Properties Drive Emergent In-Context Learning in Transformers

May 22, 2023 ยท 3 min ยท long8v ยท 

[115] ImageBind: One Embedding Space To Bind Them All

May 16, 2023 ยท 2 min ยท long8v ยท 

[114] MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

May 9, 2023 ยท 3 min ยท long8v ยท 

[113] BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

April 27, 2023 ยท 3 min ยท long8v ยท 

[112] RoFormer: Enhanced Transformer with Rotary Position Embedding

April 26, 2023 ยท 2 min ยท long8v ยท 

[111] Perceiver IO: A General Architecture for Structured Inputs & Outputs

April 24, 2023 ยท 2 min ยท long8v ยท 

[110] Understanding the Role of Self Attention for Efficient Speech Recognition

April 17, 2023 ยท 2 min ยท long8v ยท 

[109] ๐Ÿฆฉ Flamingo: a Visual Language Model for Few-Shot Learning

April 10, 2023 ยท 4 min ยท long8v ยท 

[108] Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships

April 4, 2023 ยท 4 min ยท long8v ยท 

[107] Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

March 30, 2023 ยท 2 min ยท long8v ยท 

[106] Prefix-Tuning: Optimizing Continuous Prompts for Generation

March 28, 2023 ยท 1 min ยท long8v ยท 

[105] LoRA: Low-Rank Adaptation of Large Language Models

March 27, 2023 ยท 2 min ยท long8v ยท 

[103] Deep Sets

March 20, 2023 ยท 3 min ยท long8v ยท 

[102] Attention Augmented Convolutional Networks

February 16, 2023 ยท 1 min ยท long8v ยท 

[101] Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

January 31, 2023 ยท 3 min ยท long8v ยท 

[100] An Overview of Multi-Task Learning in Deep Neural Networks

January 26, 2023 ยท 2 min ยท long8v ยท 

[99] LinkNet: Relational Embedding for Scene Graph

January 18, 2023 ยท 1 min ยท long8v ยท 

[98] Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

January 17, 2023 ยท 1 min ยท long8v ยท 

[96] Vision GNN: An Image is Worth Graph of Nodes

January 5, 2023 ยท 3 min ยท long8v ยท 

[95] Pixels to Graphs by Associative Embedding

January 4, 2023 ยท 2 min ยท long8v ยท 

[93] Mining the Benefits of Two-stage and One-stage HOI Detection

December 29, 2022 ยท 1 min ยท long8v ยท 

[92] Long-Tail Learning via Logit Adjustment

December 26, 2022 ยท 1 min ยท long8v ยท 

[91] Deep Residual Learning for Image Recognition

December 25, 2022 ยท 2 min ยท long8v ยท 

[90] Neural Collaborative Graph Machines for Table Structure Recognition

December 22, 2022 ยท 1 min ยท long8v ยท 

[89] Relational Attention: Generalizing Transformers for Graph-Structured Tasks

December 15, 2022 ยท 2 min ยท long8v ยท 

[87] Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation

December 8, 2022 ยท 2 min ยท long8v ยท 

[86] Graph R-CNN for Scene Graph Generation

December 6, 2022 ยท 2 min ยท long8v ยท 

[78] Localization Uncertainty Estimation for Anchor-Free Object Detection

November 10, 2022 ยท 1 min ยท long8v ยท 

[77] Interpretable Image Classification with Differentiable Prototype Assignment

November 9, 2022 ยท 3 min ยท long8v ยท 

[75] SESS: Saliency Enhancing with Scaling and Sliding

November 8, 2022 ยท 2 min ยท long8v ยท 

[76] Long-tail Detection with Effective Class-Margins

November 8, 2022 ยท 2 min ยท long8v ยท 

[74] โ€œThis is my unicorn, Fluffyโ€: Personalizing frozen vision-language representations

November 4, 2022 ยท 2 min ยท long8v ยท 

[71] Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

October 17, 2022 ยท 1 min ยท long8v ยท 

[70] SSD: Single Shot MultiBox Detector

October 12, 2022 ยท 2 min ยท long8v ยท 

[68] Iterative Scene Graph Generation

October 5, 2022 ยท 2 min ยท long8v ยท 

[65] Margin Calibration for Long-Tailed Visual Recognition

September 19, 2022 ยท 2 min ยท long8v ยท 

[64] Open-Vocabulary DETR with Conditional Matching

September 16, 2022 ยท 2 min ยท long8v ยท 

[63] Masked Autoencoders Are Scalable Vision Learners

September 7, 2022 ยท 2 min ยท long8v ยท 

[62] What to Hide from Your Students: Attention-Guided Masked Image Modeling

September 6, 2022 ยท 1 min ยท long8v ยท 

[61] Generative Modeling by Estimating Gradients of the Data Distribution

September 3, 2022 ยท 1 min ยท long8v ยท 

[60] Efficient Sparsely Activated Transformers

September 2, 2022 ยท 1 min ยท long8v ยท 

[59] MLP-Mixer: An all-MLP Architecture for Vision

September 1, 2022 ยท 1 min ยท long8v ยท 

[58] MetaFormer Is Actually What You Need for Vision

August 31, 2022 ยท 1 min ยท long8v ยท 

[57] Learning Transferable Architectures for Scalable Image Recognition

August 30, 2022 ยท 1 min ยท long8v ยท 

[56] NICE: Non-linear Independent Components Estimation

August 27, 2022 ยท 1 min ยท long8v ยท 

[55] Position Prediction as an Effective Pretraining Strategy

August 26, 2022 ยท 1 min ยท long8v ยท 

[54] Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

August 25, 2022 ยท 2 min ยท long8v ยท 

[53] InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

August 20, 2022 ยท 3 min ยท long8v ยท 

[51] Structured Sparse R-CNN for Direct Scene Graph Generation

August 19, 2022 ยท 3 min ยท long8v ยท 

[52] Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

August 19, 2022 ยท 3 min ยท long8v ยท 

[49] Sparse Graph Attention Networks

August 10, 2022 ยท 3 min ยท long8v ยท 

[48] SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection

August 9, 2022 ยท 1 min ยท long8v ยท 

[47] Recovering the Unbiased Scene Graphs from the Biased Ones

August 5, 2022 ยท 2 min ยท long8v ยท 

[45] BGT-Net: Bidirectional GRU Transformer Network for Scene Graph Generation

August 3, 2022 ยท 1 min ยท long8v ยท 

[46] ReFormer: The Relational Transformer for Image Captioning

August 3, 2022 ยท 3 min ยท long8v ยท 

[44] Context-Aware Scene Graph Generation With Seq2Seq Transformers

August 2, 2022 ยท 3 min ยท long8v ยท 

[41] Panoptic Scene Graph Generation

August 1, 2022 ยท 1 min ยท long8v ยท 

[43] Relation Transformer Network

August 1, 2022 ยท 4 min ยท long8v ยท 

[40] Neural Discrete Representation Learning

July 30, 2022 ยท 1 min ยท long8v ยท 

[38] Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries

July 22, 2022 ยท 2 min ยท long8v ยท 

[37] Relationformer: A Unified Framework for Image-to-Graph Generation

July 21, 2022 ยท 2 min ยท long8v ยท 

RelTR code reading

July 21, 2022 ยท 1 min ยท long8v ยท 

[36] SGTR: End-to-end Scene Graph Generation with Transformer

July 19, 2022 ยท 3 min ยท long8v ยท 

[35] RelTR: Relation Transformer for Scene Graph Generation

July 18, 2022 ยท 5 min ยท long8v ยท 

[34] What Regularized Auto-Encoders Learn from the Data Generating Distribution

July 16, 2022 ยท 1 min ยท long8v ยท 

[32] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

June 28, 2022 ยท 2 min ยท long8v ยท 

[31] GIT: A Generative Image-to-text Transformer for Vision and Language

June 26, 2022 ยท 2 min ยท long8v ยท 

[30] CoCa: Contrastive Captioners are Image-Text Foundation Models

June 22, 2022 ยท 2 min ยท long8v ยท 

[28] Learning to Compare: Relation Network for Few-Shot Learning

May 31, 2022 ยท 2 min ยท long8v ยท 

[27] Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation

May 23, 2022 ยท 3 min ยท long8v ยท 

MoEBERT code reading

May 23, 2022 ยท 1 min ยท long8v ยท 

[26] Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

May 13, 2022 ยท 1 min ยท long8v ยท 

Sparse MoE code reading

May 10, 2022 ยท 1 min ยท long8v ยท 

[25] Intriguing Properties of Vision Transformers

April 29, 2022 ยท 3 min ยท long8v ยท 

[24] DINO: Emerging Properties in Self-Supervised Vision Transformers

April 26, 2022 ยท 5 min ยท long8v ยท 

[23] Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

April 25, 2022 ยท 3 min ยท long8v ยท 

[22] Transformers without Tears: Improving the Normalization of Self-Attention

April 21, 2022 ยท 3 min ยท long8v ยท 

[21] cosFormer: Rethinking Softmax in Attention

April 20, 2022 ยท 3 min ยท long8v ยท 

[19] Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

April 6, 2022 ยท 1 min ยท long8v ยท 

[18] Deep Learning with Differential Privacy

April 4, 2022 ยท 1 min ยท long8v ยท 

[17] Membership Inference Attacks Against Machine Learning Models

March 28, 2022 ยท 1 min ยท long8v ยท 

[16] Counterfactual Memorization in Neural Language Models

March 25, 2022 ยท 3 min ยท long8v ยท 

[15] Quantifying Memorization Across Neural Language Models

March 24, 2022 ยท 3 min ยท long8v ยท 

[14] Longformer: The Long-Document Transformer

February 22, 2022 ยท 2 min ยท long8v ยท 

[12] BBPE: Neural Machine Translation with Byte-Level Subwords

February 18, 2022 ยท 3 min ยท long8v ยท 

[9] SimCLR : A Simple Framework for Contrastive Learning of Visual Representations

January 25, 2022 ยท 3 min ยท long8v ยท 

[8] SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

January 24, 2022 ยท 1 min ยท long8v ยท 

[7] SLIP: Self-supervision meets Language-Image Pre-training

January 20, 2022 ยท 1 min ยท long8v ยท 

[6] Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling

January 18, 2022 ยท 1 min ยท long8v ยท 

[5] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

January 13, 2022 ยท 1 min ยท long8v ยท 

[4] Conditional Positional Encodings for Vision Transformers

January 12, 2022 ยท 1 min ยท long8v ยท 

[3] Twins: Revisiting the Design of Spatial Attention in Vision Transformers

January 10, 2022 ยท 1 min ยท long8v ยท 

[2] ELSA: Enhanced Local Self-Attention for Vision Transformer

January 7, 2022 ยท 1 min ยท long8v ยท 

[1] Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

January 5, 2022 ยท 1 min ยท long8v ยท