📊

Dataset

Dolma3 Dolmino Pool

Name: Dolma3 Dolmino Pool
Creator: allenai
License: odc-by

by allenai hf-dataset--allenai--dolma3_dolmino_pool

Nexus Index

33.3 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 58

R: Recency 57

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 33.3 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--allenai--dolma3_dolmino_pool
License	odc-by
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__allenai__dolma3_dolmino_pool,
  author = {allenai},
  title = {Dolma3 Dolmino Pool Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/allenai/dolma3_dolmino_pool}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

allenai. (2026). Dolma3 Dolmino Pool [Dataset]. Free2AITools. https://huggingface.co/datasets/allenai/dolma3_dolmino_pool

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

33.3

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 58

Recency (R) 57

Quality (Q) 30

💬 Index Insight

FNI V2.0 for Dolma3 Dolmino Pool: Semantic (S:50), Authority (A:0), Popularity (P:58), Recency (R:57), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

96,118

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

⚠️ IMPORTANT NOTICE ⚠️

This is the Dolma 3 Dolmino pool; it hasn't been mixed. If you are interested in the data used to train:

Dolma 3 Dolmino dataset pool for Olmo 3 stage 2 annealing training

This dataset contains the high-quality pool of data considered for the second stage of Olmo 3 7B.

Dataset Sources

Source	Category	Tokens	Documents
TinyMATH Mind	Math (synth)	899M	1.42M
TinyMATH PoT	Math (synth)	241M	729K
CraneMath	Math (synth)	5.62B	6.55M
MegaMatt	Math (synth)	3.88B	6.79M
Dolmino Math	Math (synth)	10.7B	21M
StackEdu (FIM)	Code	21.4B	32M
CraneCode	Python (synth)	18.8B	19.7M
Reddit To Flashcards	QA (synth)	21.6B	370M
Wiki To RCQA	QA (synth)	4.22B	22.3M
Nemotron Synth QA	QA (synth)	487B	972M
Math Meta-Reasoning	Thinking (synth)	1.05B	984K
Code Meta-Reasoning	Thinking (synth)	1.27B	910K
Program-Verifiable	Thinking (synth)	438M	384K
OMR Rewrite FullThoughts	Thinking (synth)	850M	291K
QWQ Reasoning Traces	Thinking (synth)	4.77B	438K
General Reasoning Mix	Thinking (synth)	2.48B	668K
Gemini Reasoning Traces	Thinking (synth)	246M	55.2K
Llama Nemotron Reasoning Traces	Thinking (synth)	20.9B	3.91M
OpenThoughts2 Reasoning Traces	Thinking (synth)	5.6B	1.11M
Tulu 3 SFT	Instruction (synth)	1.61B	1.95M
Dolmino 1 Flan	Instruction (synth)	16.8B	56.9M
OLMOCR Science PDFs (High Q.)	PDFs	240B	28.7M
STEM-Heavy Crawl	Web pages	5.21B	5.16M
Common Crawl (High Q.)	Web pages	1.32T	965M
Total		2.19T	2.52B

Mix Compositions

Source	10B		100B
	Source %	Mix %	Source %	Mix %
STEM-Heavy Crawl	-	-	5.0%	5.0%
StackEdu (FIM)	-	-	10.0%	10.0%
CraneCode	-	-	10.0%	10.0%
CraneMath	-	-	5.63%	5.63%
MegaMatt	-	-	1.73%	1.73%
Dolmino Math	-	-	10.7%	10.7%
OMR Rewrite FullThoughts	-	-	0.85%	0.85%
TinyMATH Mind	-	-	0.9%	0.9%
TinyMATH PoT	-	-	0.24%	0.24%
Reddit To Flashcards	-	-	5.9%	5.9%
Wiki To RCQA	-	-	3.0%	3.0%
Nemotron Synth QA	-	-	5.0%	5.0%
Tulu 3 SFT	-	-	1.1%	1.1%
Dolmino 1 Flan	-	-	5.0%	5.0%
QWQ Reasoning Traces	-	-	1.87%	1.87%
Gemini Reasoning Traces	-	-	0.25%	0.25%
Llama Nemotron Reasoning Traces	-	-	1.25%	1.25%
OpenThoughts2 Reasoning Traces	-	-	1.25%	1.25%
Program-Verifiable	-	-	0.16%	0.16%
Math Meta-Reasoning	-	-	0.38%	0.38%
Code Meta-Reasoning	-	-	0.46%	0.46%
General Reasoning Mix	-	-	1.87%	1.87%
OLMOCR Science PDFs (High Q.)	-	-	5.0%	5.0%
Common Crawl (High Q.)	-	-	22.5%	22.5%

Licensing Information

Dolma 3 Dolmino is licensed under the Open Data Commons Attribution License v1.0 (ODC-By). It is intended for research and educational use. For more information, please see our Responsible Use Guidelines.

Citation

text

@misc{olmo2025olmo3,
title={Olmo 3},
author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
year={2025},
eprint={2512.13961},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.13961},
}

Social Proof

HuggingFace Hub

96.1KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--allenai--dolma3_dolmino_pool
slug: allenai--dolma3_dolmino_pool
source: huggingface
author: allenai
license: odc-by
tags: task_categories:text-generation, language:en, license:odc-by, arxiv:2512.13961, region:us

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 96,118
stars: 8
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!