📊

Dataset

Superior Reasoning Sft Gpt Oss 120b Logprob

Name: Superior Reasoning Sft Gpt Oss 120b Logprob
Creator: Alibaba Apsara
License: CC-BY-4.0

by Alibaba Apsara hf-dataset--alibaba-apsara--superior-reasoning-sft-gpt-oss-120b-logprob

Nexus Index

32.6 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 51

R: Recency 60

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 32.6 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--alibaba-apsara--superior-reasoning-sft-gpt-oss-120b-logprob
License	CC-BY-4.0
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__alibaba_apsara__superior_reasoning_sft_gpt_oss_120b_logprob,
  author = {Alibaba Apsara},
  title = {Superior Reasoning Sft Gpt Oss 120b Logprob Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/alibaba-apsara/superior-reasoning-sft-gpt-oss-120b-logprob}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

Alibaba Apsara. (2026). Superior Reasoning Sft Gpt Oss 120b Logprob [Dataset]. Free2AITools. https://huggingface.co/datasets/alibaba-apsara/superior-reasoning-sft-gpt-oss-120b-logprob

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

32.6

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 51

Recency (R) 60

Quality (Q) 30

💬 Index Insight

FNI V2.0 for Superior Reasoning Sft Gpt Oss 120b Logprob: Semantic (S:50), Authority (A:0), Popularity (P:51), Recency (R:60), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

26,038

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

Superior-Reasoning-SFT-gpt-oss-120b-Logprob

🚀 Overview

This dataset contains the token-level log-probabilities generated by the teacher model (gpt-oss-120b) for the reasoning samples in the main Superior-Reasoning-SFT-gpt-oss-120b Dataset.

🔗 Relationship to Main Dataset

This dataset is a companion to the main Superior-Reasoning-SFT-gpt-oss-120b dataset. Records are linked via a unique sample_uuid.

Main Dataset: Contains the text (prompts, responses), domain info, and high-level metadata.
Logprobs Dataset (This): Contains the token IDs and their corresponding log-probability values from the teacher.

📄 Data Format

The data in the DASD-Thinking dataset follows a structured format:

Example:

json

{
    "sample_uuid": "e6ebfe0b-62a2-4b8b-ac53-4c8d48facbea",
    "token_ids": [2167, 1309, ...],
    "logprobs": [-0.08339496701955795, -1.2306615114212036, ...],
}

📊 Proven Effectiveness

Models trained on this specific dataset recipe achieve State-of-the-Art performance for their size class.

4B Dense Model Performance

Model / Setting	AIME24	AIME25	LiveCodeBench v5	LiveCodeBench v6	GPQA-D
Qwen3-4B-Instruct-2507	-	47.4	-	35.1	62.5
+ Low-Temperature Training (stage 1)	84.2	74.0	56.6	50.6	67.7
+ High-Temperature Training (stage 2)	87.7	83.0	68.4	67.2	67.6

30B MoE Model Performance

DASD-30B-A3B-Thinking-Preview (trained on Stage 1 data only) demonstrates incredible data efficiency.

Model	AIME25	LiveCodeBench v6	GPQA-D	Average
gpt-oss-20b	91.7	61.0	71.5	74.7
Qwen3-30B-A3B-Thinking-2507	85.0	66.0	73.4	74.8
NVIDIA-Nemotron-3-Nano-30B-A3B	89.1	68.3	73.0	76.8
DASD-30B-A3B-Thinking-Preview (Ours)	86.7	72.8	72.3	77.3

📜 Dataset Access & License

The dataset is released under CC BY 4.0.

📚 Citation

DASD-Thinking is developed by Alibaba Cloud, as part of our mission to advance open, efficient, and trustworthy reasoning systems. If you find this work useful in your research or applications, please cite our technical report.

bibtex

@article{yan2026dasd,
  title={Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning},
  author={Yan, Shaotian and Liu, Kaiyuan and Shen, Chen and Wang, Bing and Fan, Sinan and Zhang, Jun and Wu, Yue and Wang, Zheng and Ye, Jieping},
  year={2026},
  journal={arXiv preprint arXiv:2601.09088},
  url={https://arxiv.org/abs/2601.09088}
}
    
@article{liu2025where,
  title={Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation},
  author={Liu, Kaiyuan and Yan, Shaotian and Miao, Rui and Wang, Bing and Shen, Chen and Zhang, Jun and Ye, Jieping},
  journal={arXiv preprint arXiv:2512.20908},
  year={2025}
}

We welcome collaboration, feedback, and community contributions to push the boundaries of what small models can reason about—transparently and responsibly.

📊 Structured Schema (Zero-Fabrication)

Feature Key	Data Type
`sample_uuid`	`string`
`token_ids`	`unknown`
`logprobs`	`unknown`

Estimated Rows: 104,829

Social Proof

HuggingFace Hub

26.0KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--alibaba-apsara--superior-reasoning-sft-gpt-oss-120b-logprob
slug: alibaba-apsara--superior-reasoning-sft-gpt-oss-120b-logprob
source: huggingface
author: Alibaba Apsara
license: CC-BY-4.0
tags: task_categories:text-generation, language:en, license:cc-by-4.0, size_categories:100k<n<1m, format:parquet, modality:text, library:datasets, library:dask, library:polars, library:mlcroissant, arxiv:2601.09088, arxiv:2512.20908, region:us, code, math, scientific-qa, instruction-following, reasoning, thinking, gpt-oss-120b, distill

⚙️ Technical Specs

architecture: null
params billions: 120
context length: 4,096
pipeline tag

📊 Engagement & Metrics

downloads: 26,038
stars: 63
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!