[189] Training Verifiers to Solve Math Word Problems

TL;DR

I read this because.. : ORM (Output Reward Model) is mentioned a lot. I don’t know if this is the exact paper you are referring to, but it is from the Omega PRM paper.
task : LLM in math problem solving
problem : LM has made a lot of progress, but it still can’t do multi-step mathematical reasoning.
Idea:** Propose data. After finetuning, take 100 samples, label them and train a verifier. After that, make several inferences and select the one that scores high on the verifier as the final answer.
architecture : GPT3 6B / 175B
objective : Scalar head for CE loss / verifier (maybe bce loss?)
baseline : finetuning
data : GSM8K (proposed)
evaluation : test solve ratio
result : 175B finetuned over 6B
contribution : gsm8k proposal / Multi-step math reasoning problem solved? / Predecessor of RFT…?
etc. :

Fast overfitting for 100 Guess. Only letting people see 2 epochs