
TL;DR
- task : object detection, pose estimation, object tracking, label assignment
- problem : Due to the part of DETR that does hungarian one-to-one matching (which saved me from having to do NMS and stuff), there are too many positive pairs to learn efficiently.
- idea : hybrid matching. do one-to-one matching and also do one-to-many matching (just copy gt multiple times). Do this for all layers like auxiliary loss.
- architecture : deformable DETR
- objective : objective function for each task
- baseline : deformable DETR, PETR, 3DETR…
- data : …
- result : Performance gain. The learning speed was about 65 minutes per epoch with one-to-one matching, and 85 minutes with hybrid matching.
- contribution : Improved performance with a simple trick.
Details
Different ways to do hybrid matching

Results


related works
Group DETR: Fast Training Convergence with Decoupled One-to-Many Label Assignment

When embedding query embeddings, you can divide them into K groups and allow queries to interact only within those groups.