📊

Dataset

Molmo2 Synmultiimageqa

Name: Molmo2 Synmultiimageqa
Creator: allenai
License: odc-by

by allenai hf-dataset--allenai--molmo2-synmultiimageqa

Nexus Index

28.5 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 47

R: Recency 52

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 28.5 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--allenai--molmo2-synmultiimageqa
License	odc-by
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__allenai__molmo2_synmultiimageqa,
  author = {allenai},
  title = {Molmo2 Synmultiimageqa Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/allenai/molmo2-synmultiimageqa}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

allenai. (2026). Molmo2 Synmultiimageqa [Dataset]. Free2AITools. https://huggingface.co/datasets/allenai/molmo2-synmultiimageqa

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

28.5

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 47

Recency (R) 52

Quality (Q) 30

💬 Index Insight

FNI V2.0 for Molmo2 Synmultiimageqa: Semantic (S:50), Authority (A:0), Popularity (P:47), Recency (R:52), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

15,973

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

Molmo2-SynMultiImageQA

Molmo2-SynMultiImageQA is a collection of synthetic multi-image question-answer pairs about various kinds of text-rich images, including charts, tables, documents, diagrams, etc.

The synthetic data is generated by extending the CoSyn framework into multi-image settings, with Claude-sonnet-4-5 as the coding LLM to generate code that can be executed to render an image. Then, we use GPT-5 to generate question-answer pairs with code (without using the rendered image).

Molmo2-SynMultiImageQA is part of the Molmo2 dataset collection and was used to train the Molmo2 family of models.

Quick links:

📃 Paper
🎥 Blog with Videos

Loading

The dataset has eight subsets:

chart: charts and plots
chemical: chemical structures
circuit: diagrams of electrical circuits
diagram: diagram and graphs
document: various types of documents
graphic: vector graphics
music: music sheets
table: tables and sheets

Use config_name to specify which one to load. By default, chart will be loaded. For example:

python

table_dataset = datasets.load_dataset("allenai/Molmo2-SynMultiImageQA", "table", split="train")

Data Format

Each row of the example has the following information:

id: the unique ID of each example
images: a list of rendered images from the code
code: a list of the source code for each image
qa_pairs: a list of questions, answers, and chain-of-thought explanations
qa_pairs_raw: the raw format of QA pairs without replacing the image reference (<IMAGE-N>)to natural format.
metadata: metadata of each example, including the content type, persona, overall descriptions, and the number of images.

Splits

The data is divided into validation and train splits. These splits are "unofficial" because we do not generally use this data for evaluation anyway. However, they reflect what was used when training the Molmo2 models, which were only trained on the train splits.

License

This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2’s Responsible Use Guidelines. This dataset includes synthetic images from model outputs using code generated from Claude-Sonnet-4.5, which is subject to Anthropic's Terms of Service. The questions are generated from GPT-5, which is subject to OpenAI’s Terms of Use.

Social Proof

HuggingFace Hub

16.0KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--allenai--molmo2-synmultiimageqa
slug: allenai--molmo2-synmultiimageqa
source: huggingface
author: allenai
license: odc-by
tags: task_categories:visual-question-answering, license:odc-by, size_categories:100k<n<1m, format:parquet, modality:image, modality:text, library:datasets, library:dask, library:mlcroissant, library:polars, region:us

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 15,973
stars: 6
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!