📊

Dataset

Swe Smith

Name: Swe Smith
Creator: Swe Bench
License: MIT

by Swe Bench hf-dataset--swe-bench--swe-smith

Nexus Index

33.3 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 52

R: Recency 51

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 33.3 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--swe-bench--swe-smith
License	MIT
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__swe_bench__swe_smith,
  author = {Swe Bench},
  title = {Swe Smith Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/swe-bench/swe-smith}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

Swe Bench. (2026). Swe Smith [Dataset]. Free2AITools. https://huggingface.co/datasets/swe-bench/swe-smith

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

33.3

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 52

Recency (R) 51

Quality (Q) 30

💬 Index Insight

FNI V2.0 for Swe Smith: Semantic (S:50), Authority (A:0), Popularity (P:52), Recency (R:51), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

29,841

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

SWE-smith Dataset

Code • Paper • Site

[12/14/2025] NOTE: We will no longer actively update this dataset. While this dataset is still functional and usable, we recommend you use the `SWE-bench/SWE-smith-[lang]` datasets. For better maintainability and ease-of-use, we are maintaining language-specific datasets in lieu of this mono-repo.

The SWE-smith Dataset is a training dataset of 50137 task instances from 128 GitHub repositories, collected using the SWE-smith toolkit. It is the largest dataset to date for training software engineering agents.

All SWE-smith task instances come with an executable environment. To learn more about how to use this dataset to train Language Models for Software Engineering, please refer to the documentation.

📊 Structured Schema (Zero-Fabrication)

Feature Key	Data Type
`instance_id`	`string`
`patch`	`string`
`FAIL_TO_PASS`	`unknown`
`PASS_TO_PASS`	`unknown`
`image_name`	`string`
`repo`	`string`
`problem_statement`	`string`

Estimated Rows: 59,136

Social Proof

HuggingFace Hub

29.8KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--swe-bench--swe-smith
slug: swe-bench--swe-smith
source: huggingface
author: Swe Bench
license: MIT
tags: task_categories:text-generation, language:en, license:mit, size_categories:10k<n<100k, format:parquet, modality:text, library:datasets, library:dask, library:mlcroissant, library:polars, arxiv:2504.21798, region:us, code, agents, software-engineering

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 29,841
stars: 48
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!