πŸ“Š
Dataset

openwebtext

by Skylion007 hf-dataset--skylion007--openwebtext
Nexus Index
25.0 Top 2%
S / A / P / R / Q Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context
Vital Performance
0 DL / 30D
0.0%

--- annotations_creators: - no-annotation language_creators: - found language: - en cc0-1.0 multilinguality: - monolingual pretty_name: OpenWebText size_categories: - 1M

Data Integrity 25 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--skylion007--openwebtext
Provider huggingface
πŸ“œ

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__skylion007__openwebtext,
  author = {Skylion007},
  title = {openwebtext Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/Skylion007/openwebtext}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
Skylion007. (2026). openwebtext [Dataset]. Free2AITools. https://huggingface.co/datasets/Skylion007/openwebtext

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Nexus Index V2.0

25.0
ESTIMATED IMPACT TIER
Semantic (S) 50
Authority (A) 0
Popularity (P) 0
Recency (R) 0
Quality (Q) 0

πŸ’¬ Index Insight

FNI V2.0 for openwebtext: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
⬇️
Downloads
61,063
❀️
Likes
487

πŸ‘οΈ Data Preview

πŸ“Š

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

πŸ”— Explore Full Dataset β†—

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

Top Tier

Social Proof

HuggingFace Hub
487Likes
61.1KDownloads
πŸ”„ Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

πŸ†” Identity & Source

id
hf-dataset--skylion007--openwebtext
source
huggingface
author
Skylion007
tags
task_categories:text-generationtask_categories:fill-masktask_ids:language-modelingtask_ids:masked-language-modelingannotations_creators:no-annotationlanguage_creators:foundmultilinguality:monolingualsource_datasets:originallanguage:enlicense:cc0-1.0size_categories:1mformat:parquetmodality:textlibrary:datasetslibrary:dasklibrary:polarslibrary:mlcroissantregion:us

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null

πŸ“Š Engagement & Metrics

likes
487
downloads
61,063

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)