πŸ“Š
Dataset

Swe Bench Verified

by Princeton Nlp hf-dataset--princeton-nlp--swe-bench_verified
Nexus Index
32.3 Top 100%
S: Semantic 50
A: Authority 0
P: Popularity 67
R: Recency 12
Q: Quality 30
Tech Context
Vital Performance
0 DL / 30D
0.0%
Data Integrity 32.3 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--princeton-nlp--swe-bench_verified
Provider huggingface
πŸ“œ

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__princeton_nlp__swe_bench_verified,
  author = {Princeton Nlp},
  title = {Swe Bench Verified Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/princeton-nlp/swe-bench_verified}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
Princeton Nlp. (2026). Swe Bench Verified [Dataset]. Free2AITools. https://huggingface.co/datasets/princeton-nlp/swe-bench_verified

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Nexus Index V2.0

32.3
TOP 100% SYSTEM IMPACT
Semantic (S) 50
Authority (A) 0
Popularity (P) 67
Recency (R) 12
Quality (Q) 30

πŸ’¬ Index Insight

FNI V2.0 for Swe Bench Verified: Semantic (S:50), Authority (A:0), Popularity (P:67), Recency (R:12), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
⬇️
Downloads
731,917

πŸ‘οΈ Data Preview

πŸ“Š

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

πŸ”— Explore Full Dataset β†—

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

Dataset Summary

SWE-bench Verified is a subset of 500 samples from the SWE-bench test set, which have been human-validated for quality. SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. See this post for more details on the human-validation process.

The dataset collects 500 test Issue-Pull Request pairs from popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution.

The original SWE-bench dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Want to run inference now? This dataset only contains the problem_statement (i.e. issue text) and the base_commit which represents the state of the codebase before the issue has been resolved. If you want to run inference using the "Oracle" or BM25 retrieval settings mentioned in the paper, consider the following datasets.

princeton-nlp/SWE-bench_Lite_oracle

princeton-nlp/SWE-bench_Lite_bm25_13K

princeton-nlp/SWE-bench_Lite_bm25_27K

Supported Tasks and Leaderboards SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com

Languages The text of the dataset is primarily English, but we make no effort to filter or otherwise clean based on language type.

Dataset Structure

An example of a SWE-bench datum is as follows:

text
instance_id: (str) - A formatted instance identifier, usually as repo_owner__repo_name-PR-number.
patch: (str) - The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue.
repo: (str) - The repository owner/name identifier from GitHub.
base_commit: (str) - The commit hash of the repository representing the HEAD of the repository before the solution PR is applied.
hints_text: (str) - Comments made on the issue prior to the creation of the solution PR’s first commit creation date.
created_at: (str) - The creation date of the pull request.
test_patch: (str) - A test-file patch that was contributed by the solution PR.
problem_statement: (str) - The issue title and body.
version: (str) - Installation version to use for running evaluation.
environment_setup_commit: (str) - commit hash to use for environment setup and installation.
FAIL_TO_PASS: (str) - A json list of strings that represent the set of tests resolved by the PR and tied to the issue resolution.
PASS_TO_PASS: (str) - A json list of strings that represent tests that should pass before and after the PR application.

πŸ“Š Structured Schema (Zero-Fabrication)

Feature Key Data Type
repo string
instance_id string
base_commit string
patch string
test_patch string
problem_statement string
hints_text string
created_at string
version string
FAIL_TO_PASS string
PASS_TO_PASS string
environment_setup_commit string
difficulty string

Estimated Rows: 500

Social Proof

HuggingFace Hub
731.9KDownloads
πŸ”„ Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

πŸ†” Identity & Source

id
hf-dataset--princeton-nlp--swe-bench_verified
slug
princeton-nlp--swe-bench_verified
source
huggingface
author
Princeton Nlp
license
tags
size_categories:n<1k, format:parquet, modality:text, library:datasets, library:pandas, library:mlcroissant, library:polars, region:us

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

πŸ“Š Engagement & Metrics

downloads
731,917
stars
331
forks
0

Data indexed from public sources. Updated daily.