image

paper , code , dataset

TL;DR

  • I read this because.. : Personal Research Related Research
  • task : Learn human preference for T2I generation product
  • Problem :** Measuring with FIDs is not a good representation of human preference. We need an open source preference dataset.
  • Idea :** Create a webpage to collect human preference data
  • input/output : {image, prompt} -> score
  • architecture : ViT-H/14
  • objective : KL divergence
  • baseline : Aesthetic score, CLIP-H, ImageReward, HPS, Human Expert
  • data : Pick-a-Pic data (data used in the paper is 583K of training / 500 / 500 valid and test samples)
  • evaluation : prefers to report that the difference in scores is above a threshold. spearman correlation with human expert
  • result : Highest accuracy, correlation. I preferred this to the Classifier-free guidance technique.
  • contribution : Huge data release. Release the model. Disclose performance improvements with it.
  • etc. : The neurips paper seems to have a lot of data disclosure.

Details

image

annotation

image
  • prompt is entered by the user
  • Image generation is supported by Stable Diffusion 2.1, Dreamlike Photoreal 2.0, and Stable Diffusion XL variants

Pick-a-Pic Dataset

  • Total 968K ranking
  • The paper used 583K rankings from 37K prompts and 4K users
  • Doing a lot of things to care about data quality (email verification, bot detection…)

PickScore

  • CLIP image

  • finetuning loss image

$s$ : score $x$ : prompt $y_1, y_2$: image

They tried in-batch negatives, but they didn’t perform well. trainingdms 4000 step, lr 3e-6, bs 128, warmup 500 step 8 Reportedly took less than an hour with the A100.

Result

  • rerank vis CLIP-H vs Pick-a-Pic image

  • accuracy image

  • What we learned with classifier-free guidance image

  • correlation between human expert image

  • Comparison to other models image

  • why not COCO? Image creation with COCO prompts is still the most popular way to generate images COCO uses a generic object, which is not what you want. image

  • Just generated vs. reranked with PickScore image