[208] FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models25min RL 2025Q1
[206] Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models25min RL MLLM 2025Q1
[200] Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling25min RL 2025Q1 THU
[161] MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks25min 2022Q4 XAI ACL
[149] Noise-aware Learning from Web-crawled Image-Text Data for Image CaptioningICCV 25min 2022Q4 kakao
[148] I Can't Believe There's No Images! Learning Visual Tasks Using only Language SupervisionICCV 25min CLIP 2023Q3 AI2
[126] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervisionmultimodal 2021Q1 25min kakao
[110] Understanding the Role of Self Attention for Efficient Speech Recognition2022Q1 ICLR 25min transformer
[74] โThis is my unicorn, Fluffyโ: Personalizing frozen vision-language representationsdataset 2022Q3 25min ECCV nvidia CLIP
[73] Simple Open-Vocabulary Object Detection with Vision Transformersgoogle object detection 2022Q2 25min ECCV OV
[71] Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers25min sparse 2022Q4 transformer
[54] Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language ModelsLM MoE 2022Q3 25min
[48] SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection2020Q1 long NeurIPS graph 25min