📊
Dataset

Textpecker 1.5m

by CIawevy hf-dataset--ciawevy--textpecker-1.5m
Nexus Index
41.1 Top 100%
S: Semantic 50
A: Authority 0
P: Popularity 53
R: Recency 83
Q: Quality 30
Tech Context
Vital Performance
0 DL / 30D
0.0%
Data Integrity 41.1 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--ciawevy--textpecker-1.5m
License Apache-2.0
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__ciawevy__textpecker_1.5m,
  author = {CIawevy},
  title = {Textpecker 1.5m Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/ciawevy/textpecker-1.5m}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
CIawevy. (2026). Textpecker 1.5m [Dataset]. Free2AITools. https://huggingface.co/datasets/ciawevy/textpecker-1.5m

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

âš–ī¸ Nexus Index V2.0

41.1
TOP 100% SYSTEM IMPACT
Semantic (S) 50
Authority (A) 0
Popularity (P) 53
Recency (R) 83
Quality (Q) 30

đŸ’Ŧ Index Insight

FNI V2.0 for Textpecker 1.5m: Semantic (S:50), Authority (A:0), Popularity (P:53), Recency (R:83), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
âŦ‡ī¸
Downloads
38,892

đŸ‘ī¸ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

đŸ§Ŧ Field Logic

đŸ§Ŧ

Schema not yet indexed for this dataset.

Dataset Specification

TextPecker-1.5M: A Dataset for Training and evaluating TextPecker

This repository contains the TextPecker-1.5M dataset, a new benchmark proposed in the paper "TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering".

Code and Project Page

The official implementation and project details for the TextPecker and TextPecker-1.5M dataset can be found on the GitHub repository: https://github.com/CIawevy/TextPecker

Sample Usage

You can easily load the TextPecker-1.5M dataset using the Hugging Face datasets library. The dataset is provided in two configurations: train and test

python
from datasets import load_dataset

# Load the full TextPecker-1.5M dataset (includes train and test splits)
dataset = load_dataset("CIawevy/TextPecker-1.5M", "default")
train_data = dataset["train"]
test_data = dataset["test"]

# Load specific split directly (more efficient for practical usage)
train_data = load_dataset("CIawevy/TextPecker-1.5M", "default", split="train")
test_data = load_dataset("CIawevy/TextPecker-1.5M", "default", split="test")

For detailed instructions on installation, model download, evaluation, and running demos, please refer to the GitHub repository.

Citation

If you find this dataset useful for your research, please cite the accompanying paper:

bibtex
@article{zhu2026TextPecker,
  title   = {TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering},
  author  = {Zhu, Hanshen and Liu, Yuliang and Wu, Xuecheng and Wang, An-Lan and Feng, Hao and Yang, Dingkang and Feng, Chao and Huang, Can and Tang, Jingqun and Bai, Xiang},
  journal = {arXiv preprint arXiv:2602.20903},
  year    = {2026}
}

📊 Structured Schema (Zero-Fabrication)

Feature Key Data Type
id string
images unknown
conversations unknown
data_source string
class string
ori_bbox unknown

Estimated Rows: 1,483,089

Social Proof

HuggingFace Hub
38.9KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id
hf-dataset--ciawevy--textpecker-1.5m
slug
ciawevy--textpecker-1.5m
source
huggingface
author
CIawevy
license
Apache-2.0
tags
task_categories:image-to-text, license:apache-2.0, size_categories:1m<n<10m, format:parquet, format:optimized-parquet, modality:image, modality:text, library:datasets, library:dask, library:polars, library:mlcroissant, arxiv:2602.20903, region:us

âš™ī¸ Technical Specs

architecture
null
params billions
0.005
context length
null
pipeline tag

📊 Engagement & Metrics

downloads
38,892
stars
0
forks
0

Data indexed from public sources. Updated daily.