[42] DETRs with Hybrid Matching

TL;DR

task : object detection, pose estimation, object tracking, label assignment
problem : Due to the part of DETR that does hungarian one-to-one matching (which saved me from having to do NMS and stuff), there are too many positive pairs to learn efficiently.
idea : hybrid matching. do one-to-one matching and also do one-to-many matching (just copy gt multiple times). Do this for all layers like auxiliary loss.
architecture : deformable DETR
objective : objective function for each task
baseline : deformable DETR, PETR, 3DETR…
data : …
result : Performance gain. The learning speed was about 65 minutes per epoch with one-to-one matching, and 85 minutes with hybrid matching.
contribution : Improved performance with a simple trick.

When embedding query embeddings, you can divide them into K groups and allow queries to interact only within those groups.