⚠️

This is a Dataset, not a Model

The following metrics do not apply: FNI Score, Deployment Options, Model Architecture

πŸ“Š

openmathreasoning

FNI 22.4
by nvidia Dataset

"--- language: - en license: cc-by-4.0 size_categories: - 1M"

Best Scenarios

✨ Data Science

Technical Constraints

Generic Use
- Size
- Rows
Parquet Format
391 Likes

Capabilities

  • βœ… Data Science

πŸ”¬Deep Dive

Expand Details [+]

πŸ› οΈ Technical Profile

⚑ Hardware & Scale

Size
-
Total Rows
-
Files
234

🧠 Training & Env

Format
Parquet
Cleaning
Raw

🌐 Cloud & Rights

Source
huggingface
License
CC-BY-4.0

πŸ‘οΈ Data Preview

feature label split
example_text_1 0 train
example_text_2 1 train
example_text_3 0 test
example_text_4 1 validation
example_text_5 0 train
Showing 5 sample rows. Real-time preview requires login.

🧬 Schema & Configs

Fields

feature: string
label: int64
split: string

Dataset Card

OpenMathReasoning

OpenMathReasoning is a large-scale math reasoning dataset for training large language models (LLMs). This dataset contains

* 306K unique mathematical problems sourced from AoPS forums with: * 3.2M long chain-of-thought (CoT) solutions * 1.7M long tool-integrated reasoning (TIR) solutions * 566K samples that select the most promising solution out of many candidates (GenSelect) * Additional 193K problems sourced from AoPS forums (problems only, no solutions)

We used Qwen2.5-32B-Instruct to preprocess problems, and DeepSeek-R1 and QwQ-32B to generate solutions.

This dataset was a foundation of our winning submission to the AIMO-2 Kaggle competition.

See our paper to learn more details!

_NOTE:_ We initially reported 540K unique problems in our dataset, but this figure represented the question count at the pipeline's beginning. But the actual CoT and TIR solutions in the released dataset correspond to 306K problems. Since our OpenMath-Nemotron models were trained on this reduced subset, all published results remain reproducible with the current releaseβ€”only our initial problem count was overstated.

Two factors explain this discrepancy in question numbers.

  • Our filtering process removed many questions due to format restrictions (yes/no questions, multiple choice questions, etc.), benchmark decontamination, and the inability of existing LLMs to generate valid solutions for certain problems.
  • A pipeline bug caused us to lose 137K proof-based questions. When we recovered and included this additional data in training, the SFT performance regressed. We are currently testing different approaches for incorporating these recovered questions and will release their solutions only if we identify clear performance improvements.

_NOTE:_ An early version of this data was released separately in Llama-Nemotron-Post-Training-Dataset.

Dataset fields

OpenMathReasoning dataset contains the following fields:

  • problem: Problem statement extracted from AoPS forums and refined with Qwen2.5-32B-Instruct
  • generated_solution: Synthetically generated solution using either DeepSeek-R1 or QwQ-32B
  • generation_model: DeepSeek-R1 or QwQ-32B
  • problem_type: Can be one of "has_answer_extracted", "no_answer_extracted" and "converted_proof" dependening on whether we were able to extract the answer or if this is a proof question conve

9,811 characters total