Add your submission (a job or folder of jobs) under submissions/terminal-bench/2.0/<agent>__<model(s)>/
Open a Pull Request
Submission Structure
text
submissions/
terminal-bench/
2.0/
__/
metadata.yaml # Required: agent and model info
/ # One or more job directories
config.json
/result.json
/result.json
...
Required: metadata.yaml
Each submission must include a metadata.yaml file with the following fields:
yaml
agent_url: https://... # Required: link to agent repo/docs
agent_display_name: "My Agent" # Required: display name for leaderboard
agent_org_display_name: "Org" # Required: organization name
models: # Required: list of models used
- model_name: gpt-5 # Required: model identifier
model_provider: openai # Required: provider (openai, anthropic, etc.)
model_display_name: "GPT-5" # Required
model_org_display_name: "OpenAI" # Required
# - Other models if your agent used multiple
Job Directory Requirements
Each job directory must contain all of the contents of your run.
Validation Rules
Your submission will be automatically validated. To pass:
timeout_multiplier must equal 1.0
No agent timeout overrides (override_timeout_sec, max_timeout_sec)
No verifier timeout overrides
No resource overrides (override_cpus, override_memory_mb, override_storage_mb)
All trial directories must have valid result.json files
Trial directories must contain other artifacts from the run
Each task must be evaluated with a minimum of five trials. We recommend the -k 5 flag for convenience.
Agents cannot access the Terminal-Bench website or GitHub repository (reward hacking)
Submission Process
Open PR: When you open a Pull Request, our bot will automatically validate your submission
Fix Issues: If validation fails, the bot will comment with specific errors to fix
Merge: Once validation passes, a maintainer will review and merge your PR
Import: After merge, results are automatically imported to the leaderboard