TL;DR
- I read this because.. : data synthesis
- task : data synthesis, augmentation
- problem : more diverse data synthesis
- idea : corpus-to-persona, persona-to-instruction data or person-to-text corpus
- input/output : corpus -> persona -> instruction data or personalized corpus
- architecture : Qwen2-7B
- objective : ce loss
- baseline : sota LLMs
- data : 200K persona hub, 150K problems (proposed)
- evaluation : held-out test set, MATH
- result : robust on MATH
- contribution :
- etc. :