Swe Bench Verified
Pillar scores are computed during the next indexing cycle.
--- dataset_info: features: - name: repo dtype: string - name: instance_id dtype: string - name: base_commit dtype: string - name: patch dtype: string - name: test_patch dtype: string - name: problem_statement dtype: string - name: hints_text dtype: string - name: created_at dtype: string - name: version dtype: string - name: FAIL_TO_PASS dtype: string - name: PASS_TO_PASS dtype: string - name: environment_setup_commit dtype: string - name: difficulty dtype: string splits: - name: test num_by...
| Entity Passport | |
| Registry ID | hf-dataset--princeton-nlp--swe-bench_verified |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__princeton_nlp__swe_bench_verified,
author = {Princeton Nlp},
title = {Swe Bench Verified Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified}},
note = {Accessed via Free2AITools Knowledge Fortress}
} π¬Technical Deep Dive
Full Specifications [+]βΎ
βοΈ Nexus Index V2.0
π¬ Index Insight
FNI V2.0 for Swe Bench Verified: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).
Verification Authority
ποΈ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
π Explore Full Dataset β𧬠Field Logic
Schema not yet indexed for this dataset.
Dataset Specification
dataset_info:
features:
- name: repo
dtype: string - name: instance_id
dtype: string - name: base_commit
dtype: string - name: patch
dtype: string - name: test_patch
dtype: string - name: problem_statement
dtype: string - name: hints_text
dtype: string - name: created_at
dtype: string - name: version
dtype: string - name: FAIL_TO_PASS
dtype: string - name: PASS_TO_PASS
dtype: string - name: environment_setup_commit
dtype: string - name: difficulty
dtype: string
splits: - name: test
num_bytes: 7779763
num_examples: 500
download_size: 2096679
dataset_size: 7779763
configs: - config_name: default
data_files:- split: test
path: data/test-*
- split: test
Dataset Summary
SWE-bench Verified is a subset of 500 samples from the SWE-bench test set, which have been human-validated for quality. SWE-bench is a dataset that tests systemsβ ability to solve GitHub issues automatically. See this post for more details on the human-validation process.
The dataset collects 500 test Issue-Pull Request pairs from popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution.
The original SWE-bench dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Want to run inference now?
This dataset only contains the problem_statement (i.e. issue text) and the base_commit which represents the state of the codebase before the issue has been resolved. If you want to run inference using the "Oracle" or BM25 retrieval settings mentioned in the paper, consider the following datasets.
princeton-nlp/SWE-bench_Lite_oracle
princeton-nlp/SWE-bench_Lite_bm25_13K
princeton-nlp/SWE-bench_Lite_bm25_27K
Supported Tasks and Leaderboards
SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com
Languages
The text of the dataset is primarily English, but we make no effort to filter or otherwise clean based on language type.
Dataset Structure
An example of a SWE-bench datum is as follows:
instance_id: (str) - A formatted instance identifier, usually as repo_owner__repo_name-PR-number.
patch: (str) - The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue.
repo: (str) - The repository owner/name identifier from GitHub.
base_commit: (str) - The commit hash of the repository representing the HEAD of the repository before the solution PR is applied.
hints_text: (str) - Comments made on the issue prior to the creation of the solution PRβs first commit creation date.
created_at: (str) - The creation date of the pull request.
test_patch: (str) - A test-file patch that was contributed by the solution PR.
problem_statement: (str) - The issue title and body.
version: (str) - Installation version to use for running evaluation.
environment_setup_commit: (str) - commit hash to use for environment setup and installation.
FAIL_TO_PASS: (str) - A json list of strings that represent the set of tests resolved by the PR and tied to the issue resolution.
PASS_TO_PASS: (str) - A json list of strings that represent tests that should pass before and after the PR application.
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
π‘οΈ Dataset Transparency Report
Verified data manifest for traceability and transparency.
π Identity & Source
- id
- hf-dataset--princeton-nlp--swe-bench_verified
- source
- huggingface
- author
- Princeton Nlp
- tags
- size_categories:n<1kformat:parquetmodality:textlibrary:datasetslibrary:pandaslibrary:mlcroissantlibrary:polarsregion:us
βοΈ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
π Engagement & Metrics
- likes
- 262
- downloads
- 640,517
Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)