πŸ“Š
Dataset

Swe Bench Verified

by Princeton Nlp hf-dataset--princeton-nlp--swe-bench_verified
Nexus Index
29.0 Top 1%
S / A / P / R / Q Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context
Vital Performance
0 DL / 30D
0.0%

--- dataset_info: features: - name: repo dtype: string - name: instance_id dtype: string - name: base_commit dtype: string - name: patch dtype: string - name: test_patch dtype: string - name: problem_statement dtype: string - name: hints_text dtype: string - name: created_at dtype: string - name: version dtype: string - name: FAIL_TO_PASS dtype: string - name: PASS_TO_PASS dtype: string - name: environment_setup_commit dtype: string - name: difficulty dtype: string splits: - name: test num_by...

Data Integrity 29 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--princeton-nlp--swe-bench_verified
Provider huggingface
πŸ“œ

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__princeton_nlp__swe_bench_verified,
  author = {Princeton Nlp},
  title = {Swe Bench Verified Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
Princeton Nlp. (2026). Swe Bench Verified [Dataset]. Free2AITools. https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Nexus Index V2.0

29.0
ESTIMATED IMPACT TIER
Semantic (S) 50
Authority (A) 0
Popularity (P) 0
Recency (R) 0
Quality (Q) 0

πŸ’¬ Index Insight

FNI V2.0 for Swe Bench Verified: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
⬇️
Downloads
640,517
❀️
Likes
262

πŸ‘οΈ Data Preview

πŸ“Š

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

πŸ”— Explore Full Dataset β†—

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification


dataset_info:
features:

  • name: repo
    dtype: string
  • name: instance_id
    dtype: string
  • name: base_commit
    dtype: string
  • name: patch
    dtype: string
  • name: test_patch
    dtype: string
  • name: problem_statement
    dtype: string
  • name: hints_text
    dtype: string
  • name: created_at
    dtype: string
  • name: version
    dtype: string
  • name: FAIL_TO_PASS
    dtype: string
  • name: PASS_TO_PASS
    dtype: string
  • name: environment_setup_commit
    dtype: string
  • name: difficulty
    dtype: string
    splits:
  • name: test
    num_bytes: 7779763
    num_examples: 500
    download_size: 2096679
    dataset_size: 7779763
    configs:
  • config_name: default
    data_files:
    • split: test
      path: data/test-*

Dataset Summary

SWE-bench Verified is a subset of 500 samples from the SWE-bench test set, which have been human-validated for quality. SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. See this post for more details on the human-validation process.

The dataset collects 500 test Issue-Pull Request pairs from popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution.

The original SWE-bench dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Want to run inference now?
This dataset only contains the problem_statement (i.e. issue text) and the base_commit which represents the state of the codebase before the issue has been resolved. If you want to run inference using the "Oracle" or BM25 retrieval settings mentioned in the paper, consider the following datasets.

princeton-nlp/SWE-bench_Lite_oracle

princeton-nlp/SWE-bench_Lite_bm25_13K

princeton-nlp/SWE-bench_Lite_bm25_27K

Supported Tasks and Leaderboards
SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com

Languages
The text of the dataset is primarily English, but we make no effort to filter or otherwise clean based on language type.

Dataset Structure

An example of a SWE-bench datum is as follows:

instance_id: (str) - A formatted instance identifier, usually as repo_owner__repo_name-PR-number.
patch: (str) - The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue.
repo: (str) - The repository owner/name identifier from GitHub.
base_commit: (str) - The commit hash of the repository representing the HEAD of the repository before the solution PR is applied.
hints_text: (str) - Comments made on the issue prior to the creation of the solution PR’s first commit creation date.
created_at: (str) - The creation date of the pull request.
test_patch: (str) - A test-file patch that was contributed by the solution PR.
problem_statement: (str) - The issue title and body.
version: (str) - Installation version to use for running evaluation.
environment_setup_commit: (str) - commit hash to use for environment setup and installation.
FAIL_TO_PASS: (str) - A json list of strings that represent the set of tests resolved by the PR and tied to the issue resolution.
PASS_TO_PASS: (str) - A json list of strings that represent tests that should pass before and after the PR application.
Top Tier

Social Proof

HuggingFace Hub
262Likes
640.5KDownloads
πŸ”„ Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

πŸ†” Identity & Source

id
hf-dataset--princeton-nlp--swe-bench_verified
source
huggingface
author
Princeton Nlp
tags
size_categories:n<1kformat:parquetmodality:textlibrary:datasetslibrary:pandaslibrary:mlcroissantlibrary:polarsregion:us

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null

πŸ“Š Engagement & Metrics

likes
262
downloads
640,517

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)