image

paper

TL;DR

  • task : instance segmentation
  • problem : segmentation annotation cost is too high! weakly-supervised performs only 85% of supervised
  • idea : Let’s do point level annotation! Annotate the bbox first, then take 10 random dots and let the annotator binary label them as background or object.
  • architecture : mask RCNN
  • objective : bi-linear interpolate the prediction for 10 points and then cross entropy loss
  • baseline : fully supervised mask RCNN
  • data : ImageNet, COCO
  • result : ImageNet performs about 97% of supervised, COCO performs 99%.
  • contribution : The original segmentation takes about 79 seconds per piece, but this methodology allows for annotation in 7 seconds.
  • Limitation or part not understood : PointRend model part not read

Details

image

image

It’s like using dice for segmentation and iou for object detection. As if there is no particular reason for this?