📊
Dataset

Forgespectrum 114k

by hardiksharma6555 hf-dataset--hardiksharma6555--forgespectrum-114k
Free2AITools Nexus Index
59.6 Top 100%
S: Semantic 50
A: Authority 62
P: Popularity 52
R: Recency 93
Q: Quality 50
Tech Context
Vital Performance
0 DL / 30D
0.0%
Data Integrity 59.6 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--hardiksharma6555--forgespectrum-114k
License CC-BY-NC-4.0
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__hardiksharma6555__forgespectrum_114k,
  author = {hardiksharma6555},
  title = {Forgespectrum 114k Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/hardiksharma6555/forgespectrum-114k}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
hardiksharma6555. (2026). Forgespectrum 114k [Dataset]. Free2AITools. https://huggingface.co/datasets/hardiksharma6555/forgespectrum-114k

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

âš–ī¸ Free2AITools Nexus Index V2.0

Semantic (S) 50
Authority (A) 62
Popularity (P) 52
Recency (R) 93
Quality (Q) 50

đŸ’Ŧ Index Insight

FNI V2.0 for Forgespectrum 114k: Semantic (S:50), Authority (A:62), Popularity (P:52), Recency (R:93), Quality (Q:50).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
âŦ‡ī¸
Downloads
33,414

đŸ‘ī¸ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

đŸ§Ŧ Field Logic

đŸ§Ŧ

Schema not yet indexed for this dataset.

Dataset Specification

ForgeSpectrum-114K

A multi-domain corpus of 139,756 real and AI-generated images across 5 domains (faces, scenes, documents, scene_text, id_cards) and 66 distinct generators, paired with structured 3-step reasoning traces annotated by Gemini-2.5-Pro.

Status

🚧 Annotation in progress. Currently ~85K / 110K remaining images annotated. New traces are pushed every ~30 minutes until complete.

Files

File What
annotations.jsonl Step-1 anomalies, Step-2 attribute scores, Step-3 reasoning trace per image (growing)
full_manifest.jsonl All 139,756 image paths + label + domain + generator + family
temporal_sidecar.jsonl 5,339 post-2025 generator samples (Flux1.1Pro, HART, Infinity, MAGI, GPT-4o) for temporal eval
images/ Raw image bytes, organized by wildforge/{real,fake}/{source}/... (uploading in batches)

Annotation schema (per record)

python
{
  "img_path": "...",
  "label": "real" | "fake",
  "domain": "faces" | "scenes" | "docs" | "scene_text" | "id_cards",
  "generator": "...",
  "family": "...",
  # Step 1 (5-vote consensus, T=0.8)
  "step1_anomalies": [{"type": str, "severity": str, "region": ...}],
  "step1_impression": "real" | "fake",
  "step1_confidence": float,
  # Step 2 (single deterministic call, T=0.0)
  "step2_attributes": {  # 8 forensic axes, 0-1
    "noise_consistency": float, "texture_coherence": float,
    "anatomical_accuracy": float, "lighting_consistency": float,
    "frequency_naturalness": float, "edge_quality": float,
    "color_authenticity": float, "semantic_coherence": float,
  },
  "step2_explanation": str,
  "step2_suspicious_regions": [...],
  # Step 3 (label-conditioned trace, T=0.2)
  "reasoning_trace": ".........real|fake",
  "annotator": "gemini-2.5-pro",
  "n_vote_samples": 5,
}

Known limitations (honest disclosure)

  • Single-VLM annotator. All traces are from Gemini-2.5-Pro. Reviewers should treat 5-vote consensus as variance reduction on one annotator, not a true ensemble. A 3-VLM heterogeneous re-annotation on a 2K subset is in progress for a v2 release.
  • ~10–15% Step-1 ↔ label drift on real samples (Gemini's zero-shot impression sometimes calls real photos with noisy backgrounds "fake"). Step-3 traces are label-conditioned and recover, but the inconsistency is reported here transparently.
  • Region grounding via post-hoc detector pass. Step-1 anomalies do not yet contain bounding boxes; a GroundingDINO-1.5 retrofit using the existing artifact phrases is planned.

License

Annotations: CC-BY-NC-4.0.

Image bytes are sourced from upstream public datasets (HydraFake, GenImage, FantasyID, OSTF, DocBank, Flickr/Wikimedia, etc.) and our own document-tampering pipeline. Each image inherits its upstream license; redistribution is provided under the upstream terms. See IMAGE_PROVENANCE.md for the source map.

Citation

Coming soon. Companion paper in preparation.

📊 Structured Schema (Zero-Fabrication)

Feature Key Data Type
image Image

Estimated Rows: 19,324

Social Proof

HuggingFace Hub
33.4KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id
hf-dataset--hardiksharma6555--forgespectrum-114k
slug
hardiksharma6555--forgespectrum-114k
source
huggingface
author
hardiksharma6555
license
CC-BY-NC-4.0
tags
task_categories:image-classification, task_categories:visual-question-answering, language:en, license:cc-by-nc-4.0, size_categories:10k<n<100k, format:imagefolder, modality:image, library:datasets, library:mlcroissant, region:us, deepfake-detection, ai-generated-image-detection, forensics, vlm, reasoning-traces

âš™ī¸ Technical Specs

architecture
null
params billions
null
context length
116,736
pipeline tag

📊 Engagement & Metrics

downloads
33,414
stars
0
forks
null

Data indexed from public sources. Updated daily.