[208] FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models

March 27, 2025 ยท 1 min ยท long8v ยท 

[206] Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

March 12, 2025 ยท 1 min ยท long8v ยท 

[204] DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL

February 19, 2025 ยท 2 min ยท long8v ยท 

[203] DeepSeek-V3 Technical Report

February 13, 2025 ยท 2 min ยท long8v ยท 

[200] Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

February 3, 2025 ยท 2 min ยท long8v ยท 

[197] Free Process Rewards without Process Labels

January 20, 2025 ยท 1 min ยท long8v ยท 

[195] STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning

January 9, 2025 ยท 1 min ยท long8v ยท 

[189] Training Verifiers to Solve Math Word Problems

December 9, 2024 ยท 1 min ยท long8v ยท 

[185] LLaVA-OneVision: Easy Visual Task Transfer

November 12, 2024 ยท 1 min ยท long8v ยท 

[182] Calibrated Self-Rewarding Vision Language Models

October 10, 2024 ยท 2 min ยท long8v ยท 

[179] Aligning Large Multimodal Models with Factually Augmented RLHF

September 25, 2024 ยท 2 min ยท long8v ยท 

[177] Fine-grained Image Captioning with CLIP Reward

September 6, 2024 ยท 2 min ยท long8v ยท 

[162] CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

July 11, 2024 ยท 1 min ยท long8v ยท 

[161] MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks

July 9, 2024 ยท 1 min ยท long8v ยท 

[159] Long-CLIP: Unlocking the Long-Text Capability of CLIP

May 10, 2024 ยท 1 min ยท long8v ยท 

[152] Sigmoid Loss for Language Image Pre-Training

March 12, 2024 ยท 2 min ยท long8v ยท 

[150] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

February 13, 2024 ยท 3 min ยท long8v ยท 

[149] Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

February 12, 2024 ยท 1 min ยท long8v ยท 

[148] I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

February 11, 2024 ยท 2 min ยท long8v ยท 

[129] Grounding Language Models to Images for Multimodal Inputs and Outputs

September 4, 2023 ยท 1 min ยท long8v ยท 

[126] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

August 9, 2023 ยท 2 min ยท long8v ยท 

[115] ImageBind: One Embedding Space To Bind Them All

May 16, 2023 ยท 2 min ยท long8v ยท 

[110] Understanding the Role of Self Attention for Efficient Speech Recognition

April 17, 2023 ยท 2 min ยท long8v ยท 

[106] Prefix-Tuning: Optimizing Continuous Prompts for Generation

March 28, 2023 ยท 1 min ยท long8v ยท 

[102] Attention Augmented Convolutional Networks

February 16, 2023 ยท 1 min ยท long8v ยท 

[93] Mining the Benefits of Two-stage and One-stage HOI Detection

December 29, 2022 ยท 1 min ยท long8v ยท 

[92] Long-Tail Learning via Logit Adjustment

December 26, 2022 ยท 1 min ยท long8v ยท 

[78] Localization Uncertainty Estimation for Anchor-Free Object Detection

November 10, 2022 ยท 1 min ยท long8v ยท 

[75] SESS: Saliency Enhancing with Scaling and Sliding

November 8, 2022 ยท 2 min ยท long8v ยท 

[74] โ€œThis is my unicorn, Fluffyโ€: Personalizing frozen vision-language representations

November 4, 2022 ยท 2 min ยท long8v ยท 

[71] Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

October 17, 2022 ยท 1 min ยท long8v ยท 

[65] Margin Calibration for Long-Tailed Visual Recognition

September 19, 2022 ยท 2 min ยท long8v ยท 

[63] Masked Autoencoders Are Scalable Vision Learners

September 7, 2022 ยท 2 min ยท long8v ยท 

[62] What to Hide from Your Students: Attention-Guided Masked Image Modeling

September 6, 2022 ยท 1 min ยท long8v ยท 

[60] Efficient Sparsely Activated Transformers

September 2, 2022 ยท 1 min ยท long8v ยท 

[59] MLP-Mixer: An all-MLP Architecture for Vision

September 1, 2022 ยท 1 min ยท long8v ยท 

[58] MetaFormer Is Actually What You Need for Vision

August 31, 2022 ยท 1 min ยท long8v ยท 

[57] Learning Transferable Architectures for Scalable Image Recognition

August 30, 2022 ยท 1 min ยท long8v ยท 

[55] Position Prediction as an Effective Pretraining Strategy

August 26, 2022 ยท 1 min ยท long8v ยท 

[54] Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

August 25, 2022 ยท 2 min ยท long8v ยท 

[48] SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection

August 9, 2022 ยท 1 min ยท long8v ยท 

[47] Recovering the Unbiased Scene Graphs from the Biased Ones

August 5, 2022 ยท 2 min ยท long8v ยท 

[41] Panoptic Scene Graph Generation

August 1, 2022 ยท 1 min ยท long8v ยท