[167] Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

paper , code , dataset

TL;DR

I read this because.. : Personal Research Related Research
task : Learn human preference for T2I generation product
Problem :** Measuring with FIDs is not a good representation of human preference. We need an open source preference dataset.
Idea :** Create a webpage to collect human preference data
input/output : {image, prompt} -> score
architecture : ViT-H/14
objective : KL divergence
baseline : Aesthetic score, CLIP-H, ImageReward, HPS, Human Expert
data : Pick-a-Pic data (data used in the paper is 583K of training / 500 / 500 valid and test samples)
evaluation : prefers to report that the difference in scores is above a threshold. spearman correlation with human expert
result : Highest accuracy, correlation. I preferred this to the Classifier-free guidance technique.
contribution : Huge data release. Release the model. Disclose performance improvements with it.
etc. : The neurips paper seems to have a lot of data disclosure.

Details

annotation

prompt is entered by the user
Image generation is supported by Stable Diffusion 2.1, Dreamlike Photoreal 2.0, and Stable Diffusion XL variants

Pick-a-Pic Dataset

Total 968K ranking
The paper used 583K rankings from 37K prompts and 4K users
Doing a lot of things to care about data quality (email verification, bot detection…)

PickScore

CLIP
finetuning loss

$s$ : score $x$ : prompt $y_1, y_2$: image

They tried in-batch negatives, but they didn’t perform well. trainingdms 4000 step, lr 3e-6, bs 128, warmup 500 step 8 Reportedly took less than an hour with the A100.

Result

rerank vis CLIP-H vs Pick-a-Pic
accuracy
What we learned with classifier-free guidance
correlation between human expert
Comparison to other models
why not COCO? Image creation with COCO prompts is still the most popular way to generate images COCO uses a generic object, which is not what you want.
Just generated vs. reranked with PickScore

TL;DR#

Details#

annotation#

Pick-a-Pic Dataset#

PickScore#

Result#