📊

Dataset

Terminal Bench 2 Leaderboard

Name: Terminal Bench 2 Leaderboard
Creator: sadahiy
License: Apache-2.0

by sadahiy hf-dataset--sadahiy--terminal-bench-2-leaderboard

Nexus Index

41.6 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 49

R: Recency 89

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 41.6 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--sadahiy--terminal-bench-2-leaderboard
License	Apache-2.0
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__sadahiy__terminal_bench_2_leaderboard,
  author = {sadahiy},
  title = {Terminal Bench 2 Leaderboard Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/sadahiy/terminal-bench-2-leaderboard}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

sadahiy. (2026). Terminal Bench 2 Leaderboard [Dataset]. Free2AITools. https://huggingface.co/datasets/sadahiy/terminal-bench-2-leaderboard

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

41.6

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 49

Recency (R) 89

Quality (Q) 30

💬 Index Insight

FNI V2.0 for Terminal Bench 2 Leaderboard: Semantic (S:50), Authority (A:0), Popularity (P:49), Recency (R:89), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

22,326

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

Terminal-Bench 2.0 Leaderboard Submissions

This repository accepts leaderboard submissions for Terminal-Bench 2.0.

How to Submit

Fork this repository
Create a new branch for your submission
Add your submission (a job or folder of jobs) under submissions/terminal-bench/2.0/<agent>__<model(s)>/
Open a Pull Request

Submission Structure

text

submissions/
  terminal-bench/
    2.0/
      __/
        metadata.yaml       # Required: agent and model info
        /       # One or more job directories
          config.json
          /result.json
          /result.json
          ...

Required: metadata.yaml

Each submission must include a metadata.yaml file with the following fields:

yaml

agent_url: https://...         # Required: link to agent repo/docs
agent_display_name: "My Agent" # Required: display name for leaderboard
agent_org_display_name: "Org"  # Required: organization name

models:                              # Required: list of models used
  - model_name: gpt-5                # Required: model identifier
    model_provider: openai           # Required: provider (openai, anthropic, etc.)
    model_display_name: "GPT-5"      # Required
    model_org_display_name: "OpenAI" # Required
  # - Other models if your agent used multiple

Job Directory Requirements

Each job directory must contain all of the contents of your run.

Validation Rules

Your submission will be automatically validated. To pass:

timeout_multiplier must equal 1.0
No agent timeout overrides (override_timeout_sec, max_timeout_sec)
No verifier timeout overrides
No resource overrides (override_cpus, override_memory_mb, override_storage_mb)
All trial directories must have valid result.json files
Trial directories must contain other artifacts from the run
Each task must be evaluated with a minimum of five trials. We recommend the -k 5 flag for convenience.
Agents cannot access the Terminal-Bench website or GitHub repository (reward hacking)

Submission Process

Open PR: When you open a Pull Request, our bot will automatically validate your submission
Fix Issues: If validation fails, the bot will comment with specific errors to fix
Merge: Once validation passes, a maintainer will review and merge your PR
Import: After merge, results are automatically imported to the leaderboard

Questions?

Open an issue in this repository or contact [email protected].

Social Proof

HuggingFace Hub

22.3KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--sadahiy--terminal-bench-2-leaderboard
slug: sadahiy--terminal-bench-2-leaderboard
source: huggingface
author: sadahiy
license: Apache-2.0
tags: license:apache-2.0, region:us

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 22,326
stars: 0
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!