Molmo2-SynMultiImageQA is a collection of synthetic multi-image question-answer pairs about various kinds of text-rich images, including charts, tables, documents, diagrams, etc.
The synthetic data is generated by extending the CoSyn framework into multi-image settings,
with Claude-sonnet-4-5 as the coding LLM to generate code that can be executed to render an image.
Then, we use GPT-5 to generate question-answer pairs with code (without using the rendered image).
Each row of the example has the following information:
id: the unique ID of each example
images: a list of rendered images from the code
code: a list of the source code for each image
qa_pairs: a list of questions, answers, and chain-of-thought explanations
qa_pairs_raw: the raw format of QA pairs without replacing the image reference (<IMAGE-N>)to natural format.
metadata: metadata of each example, including the content type, persona, overall descriptions, and the number of images.
Splits
The data is divided into validation and train splits. These splits are "unofficial" because we do not generally use this data for evaluation anyway.
However, they reflect what was used when training the Molmo2 models, which were only trained on the train splits.
License
This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2βs Responsible Use Guidelines.
This dataset includes synthetic images from model outputs using code generated from Claude-Sonnet-4.5, which is subject to Anthropic's Terms of Service.
The questions are generated from GPT-5, which is subject to OpenAIβs Terms of Use.