image

paper , dataset , code

TL;DR

image

  • task : Propose a segmentation-based SGG task, panoptic scene graph generation
  • problem : Many datasets have been proposed for SGG, but bbox-based SGG is problematic because it has a lot of redundant information (e.g. hair) and leaves out background.
  • idea: propose dataset / propose two-stage, one-stage baseline
  • architecture : (one-stage baseline) 1) PGSTR: Put a triplet query in DETR and pull it out directly 2) PGSFormer: Create a relation query and an object query, then select the most related objects to the relation by cosine similarity, and use them as the subject, and add two layers of FFNs to the object to form the triplet.
  • objective : SGG triplet loss. but you gave me something else instead of bbox loss, right?
  • baseline : two-stage models(IMP, MOTIFS, VCTree, GPSNet)
  • data : Visual Genome & COCO, and then select the overlapping one, and then create a new annotation -> “PSG dataset”
  • result : I took the existing two-stage models and applied them to PSG, and the proposed two-stage baseline performs better.
  • contribution : Building datasets & providing baselines.

Details

SGG datasets

image

PGSTR

image

PGSFormer

image