📊

Dataset

Openthoughts 1k Sample

Name: Openthoughts 1k Sample
Creator: ryanmarten

by ryanmarten hf-dataset--ryanmarten--openthoughts-1k-sample

Nexus Index

35.0 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 66

R: Recency 30

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 35 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--ryanmarten--openthoughts-1k-sample
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__ryanmarten__openthoughts_1k_sample,
  author = {ryanmarten},
  title = {Openthoughts 1k Sample Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/ryanmarten/openthoughts-1k-sample}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

ryanmarten. (2026). Openthoughts 1k Sample [Dataset]. Free2AITools. https://huggingface.co/datasets/ryanmarten/openthoughts-1k-sample

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

35.0

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 66

Recency (R) 30

Quality (Q) 30

💬 Index Insight

FNI V2.0 for Openthoughts 1k Sample: Semantic (S:50), Authority (A:0), Popularity (P:66), Recency (R:30), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

574,597

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

[!NOTE] We have released a paper for OpenThoughts! See our paper here.

Open-Thoughts-1k-sample

Dataset Description

Homepage: https://www.open-thoughts.ai/
Repository: https://github.com/open-thoughts/open-thoughts
Point of Contact: Open Thoughts Team

This is a 1k sample of the OpenThoughts-114k dataset.

Open synthetic reasoning dataset with high-quality examples covering math, science, code, and puzzles!

Inspect the content with rich formatting with Curator Viewer.

Available Subsets

default subset containing ready-to-train data used to finetune the OpenThinker-7B and OpenThinker-32B models:

text

ds = load_dataset("ryanmarten/OpenThoughts-1k-sample", split="train")

metadata subset containing extra columns used in dataset construction:

problem
ground_truth_solution
deepseek_reasoning
deepseek_solution
domain
source
test_cases (code only)
starter_code(code only)

text

ds = load_dataset("ryanmarten/OpenThoughts-1k-sample", "metadata", split="train")

OpenThinker Models

The numbers reported in the tables below are evaluated with our open-source tool Evalchemy.

	AIME24	MATH500	GPQA-Diamond	LCBv2 Easy	LCBv2 Medium	LCBv2 Hard	LCBv2 All
OpenThinker-32B	66	90.6	61.6	95.1	70.9	26.8	68.9
OpenThinker-7B	31.3	83.0	42.4	75.3	28.6	6.5	39.9
Bespoke-Stratos-7B	22.7	79.6	38.9	71.4	25.2	0.8	35.8
DeepSeek-R1-Distill-Qwen-7B	60	88.2	46.9	79.7	45.1	14.6	50.1
gpt-4o-0513	8.7	75.8	46.5	87.4	42.7	8.9	50.5
o1-mini	64	85.6	60	92.8	74.7	39.8	72.8

We are fully open-source. Our model weights, datasets, data generation code, evaluation code, and training code are all publicly available.

	Open Weights	Open Data	Open Code
OpenThinker-32B	✅	✅	✅
OpenThinker-7B	✅	✅	✅
Bespoke-Stratos-7B	✅	✅	✅
DeepSeek-R1-Distill models	✅	❌	❌
OpenAI/Gemini	❌	❌	❌

We are actively working towards improving the dataset, so please stay tuned!

Data Curation Recipe

Code

Math

AI-MO/NuminaMath-CoT

Science

Puzzle

INK-USC/riddle_sense

Using a curated mix of the datasets above, we generate reasoning traces from DeepSeek-R1 and verify correctness to construct the final dataset.

diagram

The full code for the data generation pipeline is publicly available in our github repo.

Citation

text

@misc{guha2025openthoughtsdatarecipesreasoning,
  title={OpenThoughts: Data Recipes for Reasoning Models}, 
  author={Etash Guha and Ryan Marten and Sedrick Keh and Negin Raoof and Georgios Smyrnis and Hritik Bansal and Marianna Nezhurina and Jean Mercat and Trung Vu and Zayne Sprague and Ashima Suvarna and Benjamin Feuer and Liangyu Chen and Zaid Khan and Eric Frankel and Sachin Grover and Caroline Choi and Niklas Muennighoff and Shiye Su and Wanjia Zhao and John Yang and Shreyas Pimpalgaonkar and Kartik Sharma and Charlie Cheng-Jie Ji and Yichuan Deng and Sarah Pratt and Vivek Ramanujan and Jon Saad-Falcon and Jeffrey Li and Achal Dave and Alon Albalak and Kushal Arora and Blake Wulfe and Chinmay Hegde and Greg Durrett and Sewoong Oh and Mohit Bansal and Saadia Gabriel and Aditya Grover and Kai-Wei Chang and Vaishaal Shankar and Aaron Gokaslan and Mike A. Merrill and Tatsunori Hashimoto and Yejin Choi and Jenia Jitsev and Reinhard Heckel and Maheswaran Sathiamoorthy and Alexandros G. Dimakis and Ludwig Schmidt},
  year={2025},
  eprint={2506.04178},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2506.04178}, 
}

📊 Structured Schema (Zero-Fabrication)

Feature Key	Data Type
`system`	`string`
`conversations`	`unknown`

Estimated Rows: 1,000

Social Proof

HuggingFace Hub

574.6KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--ryanmarten--openthoughts-1k-sample
slug: ryanmarten--openthoughts-1k-sample
source: huggingface
author: ryanmarten
license
tags: size_categories:1k<n<10k, format:parquet, modality:text, library:datasets, library:pandas, library:mlcroissant, library:polars, arxiv:2506.04178, region:us

⚙️ Technical Specs

architecture: null
params billions: null
context length: 1,024
pipeline tag

📊 Engagement & Metrics

downloads: 574,597
stars: 8
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!

Cite this dataset

🔬Technical Deep Dive

⚖️ Nexus Index V2.0

💬 Index Insight

Verification Authority

👁️ Data Preview

🧬 Field Logic

Dataset Specification

Open-Thoughts-1k-sample

Dataset Description

Available Subsets

OpenThinker Models

Data Curation Recipe

Links

Citation

📊 Structured Schema (Zero-Fabrication)

Social Proof

🛡️ Dataset Transparency Report

🆔 Identity & Source

⚙️ Technical Specs

📊 Engagement & Metrics