Dolma3 Dolmino Mix 100b 1125
Pillar scores are computed during the next indexing cycle.
This dataset contains the high-quality pool of data considered for the second stage of Olmo 3 32B. | Source | Category | |--------|----------| | TinyMATH Mind | Math (synth) | | TinyMATH PoT | Math (synth) | | CraneMath | Math (synth) | | MegaMatt | Math (synth) | | Dolmino Math | Math (synth) | | StackEdu (FIM) | Code | | C...
| Entity Passport | |
| Registry ID | hf-dataset--allenai--dolma3_dolmino_mix-100b-1125 |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__allenai__dolma3_dolmino_mix_100b_1125,
author = {allenai},
title = {Dolma3 Dolmino Mix 100b 1125 Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/allenai/dolma3_dolmino_mix-100B-1125}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
âī¸ Nexus Index V2.0
đŦ Index Insight
FNI V2.0 for Dolma3 Dolmino Mix 100b 1125: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).
Verification Authority
đī¸ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
đ Explore Full Dataset âđ§Ŧ Field Logic
Schema not yet indexed for this dataset.
Dataset Specification
license: odc-by
language:
- en
Dolma 3 Dolmino dataset pool for Olmo 3 stage 2 annealing training
This dataset contains the high-quality pool of data considered for the second stage of Olmo 3 32B.
Dataset Sources
| Source | Category |
|---|---|
| TinyMATH Mind | Math (synth) |
| TinyMATH PoT | Math (synth) |
| CraneMath | Math (synth) |
| MegaMatt | Math (synth) |
| Dolmino Math | Math (synth) |
| StackEdu (FIM) | Code |
| CraneCode | Python (synth) |
| Reddit To Flashcards | QA (synth) |
| Wiki To RCQA | QA (synth) |
| Nemotron Synth QA | QA (synth) |
| Math Meta-Reasoning | Thinking (synth) |
| Code Meta-Reasoning | Thinking (synth) |
| Program-Verifiable | Thinking (synth) |
| OMR Rewrite FullThoughts | Thinking (synth) |
| QWQ Reasoning Traces | Thinking (synth) |
| General Reasoning Mix | Thinking (synth) |
| Gemini Reasoning Traces | Thinking (synth) |
| Llama Nemotron Reasoning Traces | Thinking (synth) |
| OpenThoughts2 Reasoning Traces | Thinking (synth) |
| Tulu 3 SFT | Instruction (synth) |
| Dolmino 1 Flan | Instruction (synth) |
| OLMOCR Science PDFs (High Q.) | PDFs |
| STEM-Heavy Crawl | Web pages |
| Common Crawl (High Q.) | Web pages |
Ingredients
There were two ingredients used during stage 2 midtraining annealling of Olmo 3 32B. There were 2 versions of a 100B mix:
- Ingredient 1
- 100B tokens
- Mix composition: web pages, code, math/QA/thinking/instruction/PDFs
- Ingredient 2
- 100B tokens
- Mix composition: web pages, code, math/QA/thinking/instruction/PDFs
Licensing Information
Dolma 3 Dolmino is licensed under the Open Data Commons Attribution License v1.0 (ODC-By). It is intended for research and educational use. For more information, please see our Responsible Use Guidelines.
Citation
@misc{olmo2025olmo3,
title={Olmo 3},
author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
year={2025},
eprint={2512.13961},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.13961},
}
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
đĄī¸ Dataset Transparency Report
Verified data manifest for traceability and transparency.
đ Identity & Source
- id
- hf-dataset--allenai--dolma3_dolmino_mix-100b-1125
- source
- huggingface
- author
- allenai
- tags
- language:enlicense:odc-byarxiv:2512.13961region:us
âī¸ Technical Specs
- architecture
- null
- params billions
- 100
- context length
- 4,096
đ Engagement & Metrics
- likes
- 17
- downloads
- 154,939
Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)