task : Propose a segmentation-based SGG task, panoptic scene graph generation
problem : Many datasets have been proposed for SGG, but bbox-based SGG is problematic because it has a lot of redundant information (e.g. hair) and leaves out background.
architecture : (one-stage baseline) 1) PGSTR: Put a triplet query in DETR and pull it out directly 2) PGSFormer: Create a relation query and an object query, then select the most related objects to the relation by cosine similarity, and use them as the subject, and add two layers of FFNs to the object to form the triplet.
objective : SGG triplet loss. but you gave me something else instead of bbox loss, right?