📊
Dataset

Swe Smith

by Swe Bench hf-dataset--swe-bench--swe-smith
Nexus Index
36.3 Top 100%
S / A / P / R / Q Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context
Vital Performance
0 DL / 30D
0.0%
Data Integrity 36.3 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--swe-bench--swe-smith
License MIT
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__swe_bench__swe_smith,
  author = {Swe Bench},
  title = {Swe Smith Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/swe-bench/swe-smith}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
Swe Bench. (2026). Swe Smith [Dataset]. Free2AITools. https://huggingface.co/datasets/swe-bench/swe-smith

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

âš–ī¸ Nexus Index V2.0

36.3
ESTIMATED IMPACT TIER
Semantic (S) 0
Authority (A) 0
Popularity (P) 0
Recency (R) 0
Quality (Q) 0

đŸ’Ŧ Index Insight

FNI V2.0 for Swe Smith: Semantic (S:0), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
âŦ‡ī¸
Downloads
29,841

đŸ‘ī¸ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

đŸ§Ŧ Field Logic

đŸ§Ŧ

Schema not yet indexed for this dataset.

Dataset Specification

Code â€ĸ Paper â€ĸ Site

[12/14/2025] NOTE: We will no longer actively update this dataset. While this dataset is still functional and usable, we recommend you use the `SWE-bench/SWE-smith-[lang]` datasets. For better maintainability and ease-of-use, we are maintaining language-specific datasets in lieu of this mono-repo.

The SWE-smith Dataset is a training dataset of 50137 task instances from 128 GitHub repositories, collected using the SWE-smith toolkit. It is the largest dataset to date for training software engineering agents.

All SWE-smith task instances come with an executable environment. To learn more about how to use this dataset to train Language Models for Software Engineering, please refer to the documentation.

📊 Structured Schema (Zero-Fabrication)

Feature Key Data Type
instance_id string
patch string
FAIL_TO_PASS unknown
PASS_TO_PASS unknown
image_name string
repo string
problem_statement string

Estimated Rows: 59,136

Social Proof

HuggingFace Hub
29.8KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id
hf-dataset--swe-bench--swe-smith
slug
swe-bench--swe-smith
source
huggingface
author
Swe Bench
license
MIT
tags
task_categories:text-generation, language:en, license:mit, size_categories:10k

âš™ī¸ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

📊 Engagement & Metrics

downloads
29,841
stars
48
forks
0

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)