image

paper , code

TL;DR

  • task : object detection, pose estimation, object tracking, label assignment
  • problem : Due to the part of DETR that does hungarian one-to-one matching (which saved me from having to do NMS and stuff), there are too many positive pairs to learn efficiently.
  • idea : hybrid matching. do one-to-one matching and also do one-to-many matching (just copy gt multiple times). Do this for all layers like auxiliary loss.
  • architecture : deformable DETR
  • objective : objective function for each task
  • baseline : deformable DETR, PETR, 3DETR…
  • data :
  • result : Performance gain. The learning speed was about 65 minutes per epoch with one-to-one matching, and 85 minutes with hybrid matching.
  • contribution : Improved performance with a simple trick.

Details

Different ways to do hybrid matching

image

Results

image

image

Group DETR: Fast Training Convergence with Decoupled One-to-Many Label Assignment image

When embedding query embeddings, you can divide them into K groups and allow queries to interact only within those groups.