📊

Dataset

TMPFILE

Name: TMPFILE
Creator: Tuyuanpeng

by Tuyuanpeng tuyuanpeng/tmpfile

Free2AITools Nexus Index

59.7

S: Semantic 50

Query-time baseline · scored live at search

A: Authority 61

P: Popularity 50

R: Recency 76

Q: Quality 50

Tech Context

Vital Performance —

Source →

Data Integrity 59.7 FNI Score

- Size

- Rows

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	tuyuanpeng/tmpfile
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset_tuyuanpeng_tmpfile,
  author = {Tuyuanpeng},
  title = {TMPFILE Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/Tuyuanpeng/TMPFILE}},
  note = {Accessed via Free2AITools.}
}

APA Style

Tuyuanpeng. (2026). TMPFILE [Dataset]. Free2AITools. https://huggingface.co/datasets/Tuyuanpeng/TMPFILE

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Free2AITools Nexus Index V2.0

Methodology How FNI works

Semantic (S) 50

Query-time baseline · scored live at search

Authority (A) 61

Popularity (P) 50

Recency (R) 76

Quality (Q) 50

💬 Index Insight

FNI V2.0 for TMPFILE: Authority (A:61), Popularity (P:50), Recency (R:76), Quality (Q:50). Semantic (S) is a query-time baseline scored live at search.

Free2AITools Nexus Index

Data Sources / Provenance

HuggingFace API GitHub Metadata Arxiv Citation DB Methodology

Open data Updated: Live data

⬇️

Downloads

27,509

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

mini-swe-agent prompt search notes

这套仓库原本就分成两层能力，但之前入口不清楚，很容易让人误以为“agent 会在单次运行里自己联网、自己改 prompt”。

运行时联网 src/minisweagent/config/benchmarks/swebench.yaml 会把 mswea-web-search 和 mswea-web-fetch 装进容器，并且 prompt 里会提示模型可以用它们查公开文档。
Prompt 迭代 scripts/search_system_prompt.py 会跑一个离线的 prompt policy search。它会：

生成 prompt override
跑一轮 SWE-bench canary
分析失败轨迹
把失败模式反馈到下一轮 prompt 搜索

它不是 agent 在同一次任务里“边做边改 system prompt”，而是外部搜索脚本驱动的多轮评测闭环。

现在可直接从主脚本开启 prompt search

run_swebench_full.sh 现在支持先做 prompt search，再自动拿最佳 override 继续正式 generation/evaluation。另外也支持一个更轻量的 8 题验证模式，专门用来做 prompt / model 快速迭代，尽量复用已有产物并减少磁盘占用：

bash

LEAN_VALIDATION=1 \
MODEL=openai/gpt-5.2-2025-12-11 \
EXTRA_CONFIG_FILE=prompt_opt_runs/search_20260313_144354/best_prompt_override.yaml \
bash run_swebench_full.sh

这个模式会默认：

把 generation slice 收缩到前 8 题（可用 VALIDATION_CASES 或 VALIDATION_SLICE_SPEC 覆盖）
输出到更小的目录（默认 runs/validation_8）
evaluation 直接只评这 8 题，不再额外做二次截断
关闭激进清理，并默认单 worker，避免无意义重复构建/清理
默认不重跑已有预测；如果想强制重做，显式加 REDO_EXISTING=1

推荐迭代命令：

bash

LEAN_VALIDATION=1 \
MODEL=openai/gpt-5.2-2025-12-11 \
EXTRA_CONFIG_FILE=prompt_opt_runs/search_20260313_144354/best_prompt_override.yaml \
DO_GENERATE=1 DO_EVALUATE=1 \
bash run_swebench_full.sh

只复用现有预测重评：

bash

LEAN_VALIDATION=1 DO_GENERATE=0 DO_EVALUATE=1 bash run_swebench_full.sh

500 题稳定跑

对大批量评估，run_swebench_full.sh 现在默认会自动加存储保护：

generation 默认 GEN_WORKERS=2
评估规模达到 60 题以上时，自动切到 chunked cleanup 模式
评估规模达到 300 题以上时，进一步自动收紧到更稳的模式： EVAL_CHUNK_SIZE=2、EVAL_MAX_WORKERS=1、DISK_GB_THRESHOLD=15

推荐直接用：

bash

MODEL=openai/gpt-5.4-2026-03-05 \
DO_GENERATE=1 \
DO_EVALUATE=1 \
bash run_swebench_full.sh

如果你优先追求“尽量多解出 case”，而不是更省 token / 时间，主入口现在可以直接切到 clean profile：

bash

MODEL=openai/gpt-5.4-2026-03-05 \
SPEED_PROFILE=clean \
HIGH_ACCURACY_PRESET=1 \
DO_GENERATE=1 \
DO_EVALUATE=1 \
bash run_swebench_full.sh

这档会保留 full multi-agent prompt 栈，但额外打开更偏高召回的 clean overlay，并关闭大批量评估时的自动存储保护切换，适合你就是想要“尽量多解题”的场景。同时，`HIGH_ACCURACY_PRESET=1

Social Proof

HuggingFace Hub

27.5KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Updated daily

Source summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--tuyuanpeng--tmpfile
slug: tuyuanpeng--tmpfile
source: huggingface
author: Tuyuanpeng
license
tags: region:us

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 27,509
stars: null
forks: null

Data indexed from public sources. Updated daily.