[183] MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models

paper , code

TL;DR

I read this because.. : to improve mathvista
task : LVLM
Problem :** The existing math-related LVLM works, G-LLaVA and Math-LLaVA, have the drawbacks of limited geometric reasoning capabilities and limited CoT capabilities, respectively.
idea: Create a dataset with different math disciplines + CoTs
architecture : llava (clip-vit-large, DeepSeekMath-RL)
objective : ce loss + ppo loss
baseline : closed LLMs, LLMs, Math LLMs, Open-Source MLLMs(G-LLaVA-7B, Math-LLaVA-13B, LLaVa-1.5-7B, LLaVA-NeXT-34B)
data : (align) LLaVA-Pretrain + geo170k-align (instruct) LLaVA-instruct (math instruct) MultiMath300k-instuction, Geo170k-qa, MathV360k (PPO) MultiMath300K-val, GSM8K-train, Math-train, CMATH-train
evaluation : Mathvista, Mathverse, GSM8K, MATH, CMATH, GaoKao
Result : Highest mathvista, mathverse performance, and text math benchmarks of any open source model compared to other MLLMs.
contribution : High performance for both dataset suggestions and text/vision
etc. : The content may be obvious, but the analysis is interesting

Details

Thumbnail

proposed MultiMath-300K

Create your own image license librarian (http://test.xuekubao.com/ )
Not just QA, but also captioned
Covers geomertry problem solving, automatic theorem proving, and mathematical word problems.
It says English/Chinese, but it’s almost Chinese…?
CoT Cover

Collection Method

Round 1: Create step-by-step reasoning chains using GPT-4o. Use the original data as a hint
round 2: Evaluate whether the reasoning chain generated using GPT4-o is well generated compared to the standard answer. if it is inconsistent, modify the reasoning step.
Round 3: Use the GPT-4o answer and the standard answer, then use only the correct answer.

training

(align) LLaVA-Pretrain + geo170k-align : 1 epoch
(instruct) LLaVA-instruct: ViT also full tuning
(math instruct) MultiMath300k-instuction, Geo170k-qa, MathV360k
(PPO) Create MultiMath300K-val, GSM8K-train, Math-train, and CMATH-train as sources.

Process-supervised RL

Enable CoT reasoning to generate multiple reasoning steps
Have GPT-4o evaluate the correctness, find the step where the error occurred, and generate the correct solution.
This leads to prefer / disprefer set -> RM Learning PPO
Learn PPO with reward scores for the reasoning steps generated by each actor model

Result

Highest performance of the open source model, though not as good as the closed model

text Performance

Other math open source specific models are worse than LLaVA-NeXT.

contribution of RL

The domains used during the PPO phase, cmath, gsm8k, and math, improved, while the unused ones did not. mathvista increased by 0.8 (align and sft increased by 1.3 and 1.6 respectively) and mathverse decreased by 0.2

LLM backbone

Significant performance gap vs. vicuna. MathVista 42.9 vs 50.0 ㄷㄷ This may be partly due to the fact that MultiMath is mostly in Chinese. Still, if you look at table 3, it doesn’t mean you didn’t learn.

TL;DR#

Details#

Thumbnail#

proposed MultiMath-300K#

Collection Method#

training#

Result#

text Performance#

contribution of RL#

LLM backbone#