[106] Prefix-Tuning: Optimizing Continuous Prompts for Generation

TL;DR

I read this because.. : efficient finetuning series water
task : LLM finetuning
problem : finetuning is inefficient. Finding discrete prompts is inefficient.
idea : precede a continuous prompt.
architecture : BART, GPT-2
objective : ce loss
baseline : finetuning, finetuning top 2 layer, apdapter
data : E2E, WebNLG, DART
result : slightly lower than finetuning, slightly better than adapter or ft-top2
contribution : An idea similar to #113

Details

PLM separate and with a hidden dimension matrix $P_\theta$ for prefixes

We found that starting with a smaller matrix $P_\theta ‘$ and increasing the size with MLP performed better. After learning, we can use the prefix $P_\theta $ directly without $P_\theta ‘$.

Results

Ablations

It was better to init with real words than random initalize in low data situations.

Even task-irrelevant “elephant” was better than random. When full, Initialize was not significantly affected.

Prompt length had an upward performance curve per task 200 for summary / 10 for table to text
The prefix form, preceded by prompt, performed better than the infix form, $[x; prompt; y]$.

TL;DR#

Details#

Results#

Ablations#

TL;DR

Details

Results

Ablations