📊
Dataset

Dolma3 Longmino Pool

by allenai hf-dataset--allenai--dolma3_longmino_pool
Nexus Index
23.0 Top 2%
S / A / P / R / Q Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context
Vital Performance
0 DL / 30D
0.0%

⚠️ **IMPORTANT NOTICE** ⚠️ This is the Dolma 3 Longmino **pool**; it hasn't been mixed. If you are interested in *the data used* to train: - Olmo 3 7B: **allenai/dolma3_longmino_mix-50B-1025** - Olmo 3 32B: **allenai/dolma3_dolmino_mix-100B-1125** ---

Data Integrity 23 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--allenai--dolma3_longmino_pool
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__allenai__dolma3_longmino_pool,
  author = {allenai},
  title = {Dolma3 Longmino Pool Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/allenai/dolma3_longmino_pool}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
allenai. (2026). Dolma3 Longmino Pool [Dataset]. Free2AITools. https://huggingface.co/datasets/allenai/dolma3_longmino_pool

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

23.0
ESTIMATED IMPACT TIER
Semantic (S) 50
Authority (A) 0
Popularity (P) 0
Recency (R) 0
Quality (Q) 0

💬 Index Insight

FNI V2.0 for Dolma3 Longmino Pool: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
⬇️
Downloads
32,317
❤️
Likes
10

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification


license: odc-by
language:

  • en
    configs:
    • config_name: default
      data_files:
      • split: train
        path: data/**/*

⚠️ IMPORTANT NOTICE ⚠️

This is the Dolma 3 Longmino pool; it hasn't been mixed.
If you are interested in the data used to train:


Logo for Longmino Pool

Dolma 3 Longmino Pool (639B)

Dolma 3 Longmino Pool is the full pool of documents considered for stage 3 (long context) extension trainin of Olmo 3 7B.

Dataset Sources

Source Type Tokens Docs
LC-s2pdf-REX 32k-64k Synth PDFs 24.1B 492K
LC-s2pdf-CWE 32k-64k Synth PDFs 8.77B 189K
LC-s2pdf 32k-64k PDFs 106B 2.30M
LC-s2pdf 8k-32k (8-16k) PDFs 144B 12.7M
LC-s2pdf 8k-32k (16-32k) PDFs 115B 5.06M
LC-s2pdf 64k-128k PDFs 96.0B 1.05M
LC-s2pdf 128k-256k PDFs 60.8B 342K
LC-s2pdf 256k-512k PDFs 35.1B 97.1K
LC-s2pdf 512k-1M PDFs 21.5B 30.2K
LC-s2pdf 1M+ PDFs 26.9B 12.2K
Total 639B 22.3M

Licensing Information

Dolma 3 Longmino is licensed under the Open Data Commons Attribution License v1.0 (ODC-By). It is intended for research and educational use. For more information, please see our Responsible Use Guidelines.

Citation

@misc{olmo2025olmo3,
title={Olmo 3},
author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
year={2025},
eprint={2512.13961},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.13961},
}
Top Tier

Social Proof

HuggingFace Hub
10Likes
32.3KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id
hf-dataset--allenai--dolma3_longmino_pool
source
huggingface
author
allenai
tags
language:enlicense:odc-byarxiv:2512.13961region:us

⚙️ Technical Specs

architecture
null
params billions
null
context length
null

📊 Engagement & Metrics

likes
10
downloads
32,317

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)