Image

paper , dataset

TL;DR

  • I read this because.. : data synthesis
  • task : data synthesis, augmentation
  • problem : more diverse data synthesis
  • idea : corpus-to-persona, persona-to-instruction data or person-to-text corpus
  • input/output : corpus -> persona -> instruction data or personalized corpus
  • architecture : Qwen2-7B
  • objective : ce loss
  • baseline : sota LLMs
  • data : 200K persona hub, 150K problems (proposed)
  • evaluation : held-out test set, MATH
  • result : robust on MATH
  • contribution :
  • etc. :

Details

Image Image Image Image Image Image Image

result

Image Image Image