📊

Dataset

MMMU

Name: MMMU
Creator: MMMU
License: Apache-2.0

by MMMU hf-dataset--mmmu--mmmu

Nexus Index

44.3 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 59

R: Recency 97

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 44.3 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--mmmu--mmmu
License	Apache-2.0
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__mmmu__mmmu,
  author = {MMMU},
  title = {MMMU Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/mmmu/mmmu}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

MMMU. (2026). MMMU [Dataset]. Free2AITools. https://huggingface.co/datasets/mmmu/mmmu

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

44.3

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 59

Recency (R) 97

Quality (Q) 30

💬 Index Insight

FNI V2.0 for MMMU: Semantic (S:50), Authority (A:0), Popularity (P:59), Recency (R:97), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

97,919

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

MMMU (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)

🔔News

🛠️[2026-04-21]: Fixed option issue in test_Psychology_15.
‼️[2026-02-12]: We have released the answers for the test set! You can now evaluate your models on the test set locally! 🎉
🛠️[2024-05-30]: Fixed duplicate option issues in Materials dataset items (validation_Materials_25; test_Materials_17, 242) and content error in validation_Materials_25.
🛠️[2024-04-30]: Fixed missing "-" or "^" signs in Math dataset items (dev_Math_2, validation_Math_11, 12, 16; test_Math_8, 23, 43, 113, 164, 223, 236, 287, 329, 402, 498) and corrected option errors in validation_Math_2. If you encounter any issues with the dataset, please contact us promptly!
🚀[2024-01-31]: We added Human Expert performance on the Leaderboard!🌟
🔥[2023-12-04]: ~~Our evaluation server for test set is now availble on EvalAI.~~ We welcome all submissions and look forward to your participation! 😆

Dataset Details

Dataset Description

We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. We believe MMMU will stimulate the community to build next-generation multimodal foundation models towards expert artificial general intelligence (AGI).

🎯 We have released a full set comprising 150 development samples, 900 validation samples and 10,500 test samples. The development set is used for few-shot/in-context learning, and the validation set is used for debugging models, selecting hyperparameters, or quick evaluations. ~~The answers and explanations for the test set questions are withheld. You can submit your model's predictions for the test set on EvalAI.~~

The answers and explanations for the test set samples are now released. You can evaluate your models locally!

image/png

Dataset Creation

MMMU was created to challenge multimodal models with tasks that demand college-level subject knowledge and deliberate reasoning, pushing the boundaries of what these models can achieve in terms of expert-level perception and reasoning. The data for the MMMU dataset was manually collected by a team of college students from various disciplines, using online sources, textbooks, and lecture materials.

Content: The dataset contains 11.5K college-level problems across six broad disciplines (Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, Tech & Engineering) and 30 college subjects.
Image Types: The dataset includes 30 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures, interleaved with text.

image/png

🏆 Mini-Leaderboard

We show a mini-leaderboard here and please find more information in our paper or homepage.

Model	Val (900)	Test (10.5K)
Expert (Best)	88.6	-
Expert (Medium)	82.6	-
Expert (Worst)	76.2	-
GPT-4o*	69.1	-
Gemini 1.5 Pro*	62.2	-
InternVL2-Pro*	62.0	55.7
Gemini 1.0 Ultra*	59.4	-
Claude 3 Opus*	59.4	-
GPT-4V(ision) (Playground)	56.8	55.7
Reka Core*	56.3	-
Gemini 1.5 Flash*	56.1	-
SenseChat-Vision-0423-Preview*	54.6	50.3
Reka Flash*	53.3	-
Claude 3 Sonnet*	53.1	-
HPT Pro*	52.0	-
VILA1.5*	51.9	46.9
Qwen-VL-MAX*	51.4	46.8
InternVL-Chat-V1.2*	51.6	46.2
Skywork-VL*	51.4	46.2
LLaVA-1.6-34B*	51.1	44.7
Claude 3 Haiku*	50.2	-
Adept Fuyu-Heavy*	48.3	-
Gemini 1.0 Pro*	47.9	-
Marco-VL-Plus*	46.2	44.3
Yi-VL-34B*	45.9	41.6
Qwen-VL-PLUS*	45.2	40.8
HPT Air*	44.0	-
Reka Edge*	42.8	-
Marco-VL*	41.2	40.4
OmniLMM-12B*	41.1	40.4
Bunny-8B*	43.3	39.0
Bunny-4B*	41.4	38.4
Weitu-VL-1.0-15B*	-	38.4
InternLM-XComposer2-VL*	43.0	38.2
Yi-VL-6B*	39.1	37.8
InfiMM-Zephyr-7B*	39.4	35.5
InternVL-Chat-V1.1*	39.1	35.3
Math-LLaVA-13B*	38.3	34.6
SVIT*	38.0	34.1
MiniCPM-V*	37.2	34.1
MiniCPM-V-2*	37.1	-
Emu2-Chat*	36.3	34.1
BLIP-2 FLAN-T5-XXL	35.4	34.0
InstructBLIP-T5-XXL	35.7	33.8
LLaVA-1.5-13B	36.4	33.6
Bunny-3B*	38.2	33.0
Qwen-VL-7B-Chat	35.9	32.9
SPHINX*	32.9	32.9
mPLUG-OWL2*	32.7	32.1
BLIP-2 FLAN-T5-XL	34.4	31.0
InstructBLIP-T5-XL	32.9	30.6
Gemini Nano2*	32.6	-
CogVLM	32.1	30.1
Otter	32.2	29.1
LLaMA-Adapter2-7B	29.8	27.7
MiniGPT4-Vicuna-13B	26.8	27.6
Adept Fuyu-8B	27.9	27.4
Kosmos2	24.4	26.6
OpenFlamingo2-9B	28.7	26.3
Frequent Choice	22.1	23.9
Random Choice	26.8	25.8

*: results provided by the authors.

Limitations

Despite its comprehensive nature, MMMU, like any benchmark, is not without limitations. The manual curation process, albeit thorough, may carry biases. And the focus on college-level subjects might not fully be a sufficient test for Expert AGI. However, we believe it should be necessary for an Expert AGI to achieve strong performance on MMMU to demonstrate their broad and deep subject knowledge as well as expert-level understanding and reasoning capabilities. In future work, we plan to incorporate human evaluations into MMMU. This will provide a more grounded comparison between model capabilities and expert performance, shedding light on the proximity of current AI systems to achieving Expert AGI.

Disclaimers

The guidelines for the annotators emphasized strict compliance with copyright and licensing rules from the initial data source, specifically avoiding materials from websites that forbid copying and redistribution. Should you encounter any data samples potentially breaching the copyright or licensing regulations of any site, we encourage you to notify us. Upon verification, such samples will be promptly removed.

Contact

Xiang Yue: [email protected]
Yu Su: [email protected]
Wenhu Chen: [email protected]

Citation

BibTeX:

bibtex

@inproceedings{yue2023mmmu,
  title={MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI},
  author={Xiang Yue and Yuansheng Ni and Kai Zhang and Tianyu Zheng and Ruoqi Liu and Ge Zhang and Samuel Stevens and Dongfu Jiang and Weiming Ren and Yuxuan Sun and Cong Wei and Botao Yu and Ruibin Yuan and Renliang Sun and Ming Yin and Boyuan Zheng and Zhenzhu Yang and Yibo Liu and Wenhao Huang and Huan Sun and Yu Su and Wenhu Chen},
  booktitle={Proceedings of CVPR},
  year={2024},
}

📊 Structured Schema (Zero-Fabrication)

Feature Key	Data Type
`id`	`string`
`question`	`string`
`options`	`string`
`explanation`	`string`
`image_1`	`Image`
`image_2`	`Image`
`image_3`	`Image`
`image_4`	`Image`
`image_5`	`Image`
`image_6`	`Image`
`image_7`	`Image`
`img_type`	`string`
`answer`	`string`
`topic_difficulty`	`string`
`question_type`	`string`
`subfield`	`string`

Estimated Rows: 415

Social Proof

HuggingFace Hub

97.9KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--mmmu--mmmu
slug: mmmu--mmmu
source: huggingface
author: MMMU
license: Apache-2.0
tags: task_categories:question-answering, task_categories:visual-question-answering, task_categories:multiple-choice, language:en, license:apache-2.0, size_categories:10k<n<100k, format:parquet, modality:image, modality:text, library:datasets, library:dask, library:mlcroissant, library:polars, arxiv:2311.16502, region:us, biology, medical, finance, chemistry, music, art, art_theory, design, business, accounting, economics, manage, marketing, health, medicine, basic_medical_science, clinical, pharmacy,

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 97,919
stars: 325
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!