📊

Dataset

Dolma3 Dolmino Mix 100b 1125

Name: Dolma3 Dolmino Mix 100b 1125
Creator: allenai
License: odc-by

by allenai hf-dataset--allenai--dolma3_dolmino_mix-100b-1125

Nexus Index

26.0 Top 1%

S / A / P / R / Q Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context

Vital Performance

0 DL / 30D

0.0%

This dataset contains the high-quality pool of data considered for the second stage of Olmo 3 32B. | Source | Category | |--------|----------| | TinyMATH Mind | Math (synth) | | TinyMATH PoT | Math (synth) | | CraneMath | Math (synth) | | MegaMatt | Math (synth) | | Dolmino Math | Math (synth) | | StackEdu (FIM) | Code | | C...

Source →

Data Integrity 26 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--allenai--dolma3_dolmino_mix-100b-1125
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__allenai__dolma3_dolmino_mix_100b_1125,
  author = {allenai},
  title = {Dolma3 Dolmino Mix 100b 1125 Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/allenai/dolma3_dolmino_mix-100B-1125}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

allenai. (2026). Dolma3 Dolmino Mix 100b 1125 [Dataset]. Free2AITools. https://huggingface.co/datasets/allenai/dolma3_dolmino_mix-100B-1125

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

26.0

ESTIMATED IMPACT TIER

Semantic (S) 50

Authority (A) 0

Popularity (P) 0

Recency (R) 0

Quality (Q) 0

💬 Index Insight

FNI V2.0 for Dolma3 Dolmino Mix 100b 1125: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

154,939

❤️

Likes

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

license: odc-by
language:

Dolma 3 Dolmino dataset pool for Olmo 3 stage 2 annealing training

This dataset contains the high-quality pool of data considered for the second stage of Olmo 3 32B.

Dataset Sources

Source	Category
TinyMATH Mind	Math (synth)
TinyMATH PoT	Math (synth)
CraneMath	Math (synth)
MegaMatt	Math (synth)
Dolmino Math	Math (synth)
StackEdu (FIM)	Code
CraneCode	Python (synth)
Reddit To Flashcards	QA (synth)
Wiki To RCQA	QA (synth)
Nemotron Synth QA	QA (synth)
Math Meta-Reasoning	Thinking (synth)
Code Meta-Reasoning	Thinking (synth)
Program-Verifiable	Thinking (synth)
OMR Rewrite FullThoughts	Thinking (synth)
QWQ Reasoning Traces	Thinking (synth)
General Reasoning Mix	Thinking (synth)
Gemini Reasoning Traces	Thinking (synth)
Llama Nemotron Reasoning Traces	Thinking (synth)
OpenThoughts2 Reasoning Traces	Thinking (synth)
Tulu 3 SFT	Instruction (synth)
Dolmino 1 Flan	Instruction (synth)
OLMOCR Science PDFs (High Q.)	PDFs
STEM-Heavy Crawl	Web pages
Common Crawl (High Q.)	Web pages

Ingredients

There were two ingredients used during stage 2 midtraining annealling of Olmo 3 32B. There were 2 versions of a 100B mix:

Ingredient 1
- 100B tokens
- Mix composition: web pages, code, math/QA/thinking/instruction/PDFs
Ingredient 2
- 100B tokens
- Mix composition: web pages, code, math/QA/thinking/instruction/PDFs

Licensing Information

Dolma 3 Dolmino is licensed under the Open Data Commons Attribution License v1.0 (ODC-By). It is intended for research and educational use. For more information, please see our Responsible Use Guidelines.

Citation

@misc{olmo2025olmo3,
title={Olmo 3},
author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
year={2025},
eprint={2512.13961},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.13961},
}

Top Tier

Social Proof

HuggingFace Hub

17Likes

154.9KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id: hf-dataset--allenai--dolma3_dolmino_mix-100b-1125
source: huggingface
author: allenai
tags: language:enlicense:odc-byarxiv:2512.13961region:us

⚙️ Technical Specs

architecture: null
params billions: 100
context length: 4,096

📊 Engagement & Metrics

likes: 17
downloads: 154,939

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!