📊
Dataset

UltiMath

by Datamuncher Labs hf-dataset--datamuncher-labs--ultimath
Nexus Index
37.4 Top 100%
S / A / P / R / Q Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context
Vital Performance
0 DL / 30D
0.0%
Data Integrity 37.4 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--datamuncher-labs--ultimath
License CC-BY-SA-4.0
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__datamuncher_labs__ultimath,
  author = {Datamuncher Labs},
  title = {UltiMath Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/datamuncher-labs/ultimath}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
Datamuncher Labs. (2026). UltiMath [Dataset]. Free2AITools. https://huggingface.co/datasets/datamuncher-labs/ultimath

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

âš–ī¸ Nexus Index V2.0

37.4
ESTIMATED IMPACT TIER
Semantic (S) 0
Authority (A) 0
Popularity (P) 0
Recency (R) 0
Quality (Q) 0

đŸ’Ŧ Index Insight

FNI V2.0 for UltiMath: Semantic (S:0), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
âŦ‡ī¸
Downloads
18,617

đŸ‘ī¸ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

đŸ§Ŧ Field Logic

đŸ§Ŧ

Schema not yet indexed for this dataset.

Dataset Specification

Dataset Card for UltiMath

UltiMath is a large-scale synthetic dataset containing ~33 billion math reasoning examples, designed to enhance arithmetic and symbolic reasoning in large language models (LLMs).

Dataset Details

Dataset Description

  • Curated by: [Roman]
  • Funded by: [No funding used]
  • Shared by [Roman]: [Uploads via API]
  • License: [CC by SA 4.0]

Dataset Sources [Code Generated]

Uses

Designed to improve multi-step arithmetic, algebraic manipulation, equation solving, and symbolic reasoning during pretraining or continued pretraining of LLMs.

Direct Use

It is supposed to be part of the [pretraining] corpus. Then it can be later used for finetuning.

Out-of-Scope Use

Redistribution is permitted under CC BY 4.0, provided proper attribution is given to the original creator ([Roman]). Mirroring without credit or claiming authorship is not allowed.

Dataset Structure

Each shard is 5,000,000 Rows. It follows this pattern:

  • "problem": "What is 42 + 17?",
  • "steps": "Add 42 and 17.",
  • "explanation": "Addition sums two integers.",
  • "answer": "59",
  • "difficulty": "easy"

Dataset Creation

There is a total of ~3,7T tokens

Curation Rationale

I made this dataset since I dont think there is enough synthetic math reasoining in the open source ML community.

Source Data

I used a python script to automatically generate the data.

Data Collection and Processing

Theres no filtering done to the data, its a raw upload of the generated shards.

Who are the source data producers?

I am the creator of this dataset.

Personal and Sensitive Information

Unless math is personally offended that i mass produced variants of equations, then uploaded them, ther is no sensitive / personal information.

Bias, Risks, and Limitations

The data is synthetically generated with uniform sampling across templates, but it may underrepresent advanced topics (e.g., calculus, proofs) and non-English contexts. Overuse during training may lead to arithmetic overfitting or reduced fluency in non-math tasks. For the datasets purpose, there are currently no limitations.

Recommendations

With respect to what I have mentioned before, I think use of this dataset is more for research. However if you want to use it on a smaller model, use only a few shards. (the dataset is very massive)

Dataset Card Contact

email - [[email protected]] Please be respectful and avoid spamming my email. Thank you in advance.

BibTex

@dataset{ultimath2025, title={UltiMath: Large-Scale Synthetic Math Reasoning Dataset}, author={DataMuncher-Labs}, year={2025}, license={CC BY-SA 4.0}, url={https://huggingface.co/datasets/DataMuncher-Labs/UltiMath} }

Social Proof

HuggingFace Hub
18.6KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id
hf-dataset--datamuncher-labs--ultimath
slug
datamuncher-labs--ultimath
source
huggingface
author
Datamuncher Labs
license
CC-BY-SA-4.0
tags
task_categories:text-generation, language:en, license:cc-by-sa-4.0, size_categories:10b<n<100b, format:parquet, modality:text, library:datasets, library:dask, library:polars, library:mlcroissant, region:us, math, logic, arithmatic, en, english, reasoning, reason, doi:10.57967/hf/7843

âš™ī¸ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

📊 Engagement & Metrics

downloads
18,617
stars
41
forks
0

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)