πŸ“Š
Dataset

Molmo2 Synmultiimageqa

by allenai hf-dataset--allenai--molmo2-synmultiimageqa
Nexus Index
28.5 Top 100%
S: Semantic 50
A: Authority 0
P: Popularity 47
R: Recency 52
Q: Quality 30
Tech Context
Vital Performance
0 DL / 30D
0.0%
Data Integrity 28.5 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--allenai--molmo2-synmultiimageqa
License odc-by
Provider huggingface
πŸ“œ

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__allenai__molmo2_synmultiimageqa,
  author = {allenai},
  title = {Molmo2 Synmultiimageqa Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/allenai/molmo2-synmultiimageqa}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
allenai. (2026). Molmo2 Synmultiimageqa [Dataset]. Free2AITools. https://huggingface.co/datasets/allenai/molmo2-synmultiimageqa

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Nexus Index V2.0

28.5
TOP 100% SYSTEM IMPACT
Semantic (S) 50
Authority (A) 0
Popularity (P) 47
Recency (R) 52
Quality (Q) 30

πŸ’¬ Index Insight

FNI V2.0 for Molmo2 Synmultiimageqa: Semantic (S:50), Authority (A:0), Popularity (P:47), Recency (R:52), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
⬇️
Downloads
15,973

πŸ‘οΈ Data Preview

πŸ“Š

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

πŸ”— Explore Full Dataset β†—

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

Molmo2-SynMultiImageQA

Molmo2-SynMultiImageQA is a collection of synthetic multi-image question-answer pairs about various kinds of text-rich images, including charts, tables, documents, diagrams, etc.

The synthetic data is generated by extending the CoSyn framework into multi-image settings, with Claude-sonnet-4-5 as the coding LLM to generate code that can be executed to render an image. Then, we use GPT-5 to generate question-answer pairs with code (without using the rendered image).

Molmo2-SynMultiImageQA is part of the Molmo2 dataset collection and was used to train the Molmo2 family of models.

Quick links:

Loading

The dataset has eight subsets:

  • chart: charts and plots
  • chemical: chemical structures
  • circuit: diagrams of electrical circuits
  • diagram: diagram and graphs
  • document: various types of documents
  • graphic: vector graphics
  • music: music sheets
  • table: tables and sheets

Use config_name to specify which one to load. By default, chart will be loaded. For example:

python
table_dataset = datasets.load_dataset("allenai/Molmo2-SynMultiImageQA", "table", split="train")

Data Format

Each row of the example has the following information:

  • id: the unique ID of each example
  • images: a list of rendered images from the code
  • code: a list of the source code for each image
  • qa_pairs: a list of questions, answers, and chain-of-thought explanations
  • qa_pairs_raw: the raw format of QA pairs without replacing the image reference (<IMAGE-N>)to natural format.
  • metadata: metadata of each example, including the content type, persona, overall descriptions, and the number of images.

Splits

The data is divided into validation and train splits. These splits are "unofficial" because we do not generally use this data for evaluation anyway. However, they reflect what was used when training the Molmo2 models, which were only trained on the train splits.

License

This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2’s Responsible Use Guidelines. This dataset includes synthetic images from model outputs using code generated from Claude-Sonnet-4.5, which is subject to Anthropic's Terms of Service. The questions are generated from GPT-5, which is subject to OpenAI’s Terms of Use.

Social Proof

HuggingFace Hub
16.0KDownloads
πŸ”„ Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

πŸ†” Identity & Source

id
hf-dataset--allenai--molmo2-synmultiimageqa
slug
allenai--molmo2-synmultiimageqa
source
huggingface
author
allenai
license
odc-by
tags
task_categories:visual-question-answering, license:odc-by, size_categories:100k<n<1m, format:parquet, modality:image, modality:text, library:datasets, library:dask, library:mlcroissant, library:polars, region:us

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

πŸ“Š Engagement & Metrics

downloads
15,973
stars
6
forks
0

Data indexed from public sources. Updated daily.