
TL;DR
- I read this because.. : efficient finetuning series water
- task : LLM finetuning
- problem : finetuning is inefficient. Finding discrete prompts is inefficient.
- idea : precede a continuous prompt.
- architecture : BART, GPT-2
- objective : ce loss
- baseline : finetuning, finetuning top 2 layer, apdapter
- data : E2E, WebNLG, DART
- result : slightly lower than finetuning, slightly better than adapter or ft-top2
- contribution : An idea similar to #113
Details

PLM separate and with a hidden dimension matrix $P_\theta$ for prefixes


We found that starting with a smaller matrix $P_\theta ‘$ and increasing the size with MLP performed better. After learning, we can use the prefix $P_\theta $ directly without $P_\theta ‘$.
Results

Ablations
- It was better to init with real words than random initalize in low data situations.

Even task-irrelevant “elephant” was better than random. When full, Initialize was not significantly affected.
Prompt length had an upward performance curve per task 200 for summary / 10 for table to text

The prefix form, preceded by prompt, performed better than the infix form, $[x; prompt; y]$.
