📊

Dataset

Textpecker 1.5m

Name: Textpecker 1.5m
Creator: CIawevy
License: Apache-2.0

by CIawevy hf-dataset--ciawevy--textpecker-1.5m

Nexus Index

41.1 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 53

R: Recency 83

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 41.1 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--ciawevy--textpecker-1.5m
License	Apache-2.0
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__ciawevy__textpecker_1.5m,
  author = {CIawevy},
  title = {Textpecker 1.5m Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/ciawevy/textpecker-1.5m}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

CIawevy. (2026). Textpecker 1.5m [Dataset]. Free2AITools. https://huggingface.co/datasets/ciawevy/textpecker-1.5m

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

41.1

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 53

Recency (R) 83

Quality (Q) 30

💬 Index Insight

FNI V2.0 for Textpecker 1.5m: Semantic (S:50), Authority (A:0), Popularity (P:53), Recency (R:83), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

38,892

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

TextPecker-1.5M: A Dataset for Training and evaluating TextPecker

This repository contains the TextPecker-1.5M dataset, a new benchmark proposed in the paper "TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering".

Code and Project Page

The official implementation and project details for the TextPecker and TextPecker-1.5M dataset can be found on the GitHub repository: https://github.com/CIawevy/TextPecker

Sample Usage

You can easily load the TextPecker-1.5M dataset using the Hugging Face datasets library. The dataset is provided in two configurations: train and test

python

from datasets import load_dataset

# Load the full TextPecker-1.5M dataset (includes train and test splits)
dataset = load_dataset("CIawevy/TextPecker-1.5M", "default")
train_data = dataset["train"]
test_data = dataset["test"]

# Load specific split directly (more efficient for practical usage)
train_data = load_dataset("CIawevy/TextPecker-1.5M", "default", split="train")
test_data = load_dataset("CIawevy/TextPecker-1.5M", "default", split="test")

For detailed instructions on installation, model download, evaluation, and running demos, please refer to the GitHub repository.

Citation

If you find this dataset useful for your research, please cite the accompanying paper:

bibtex

@article{zhu2026TextPecker,
  title   = {TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering},
  author  = {Zhu, Hanshen and Liu, Yuliang and Wu, Xuecheng and Wang, An-Lan and Feng, Hao and Yang, Dingkang and Feng, Chao and Huang, Can and Tang, Jingqun and Bai, Xiang},
  journal = {arXiv preprint arXiv:2602.20903},
  year    = {2026}
}

📊 Structured Schema (Zero-Fabrication)

Feature Key	Data Type
`id`	`string`
`images`	`unknown`
`conversations`	`unknown`
`data_source`	`string`
`class`	`string`
`ori_bbox`	`unknown`

Estimated Rows: 1,483,089

Social Proof

HuggingFace Hub

38.9KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--ciawevy--textpecker-1.5m
slug: ciawevy--textpecker-1.5m
source: huggingface
author: CIawevy
license: Apache-2.0
tags: task_categories:image-to-text, license:apache-2.0, size_categories:1m<n<10m, format:parquet, format:optimized-parquet, modality:image, modality:text, library:datasets, library:dask, library:polars, library:mlcroissant, arxiv:2602.20903, region:us

⚙️ Technical Specs

architecture: null
params billions: 0.005
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 38,892
stars: 0
forks: null

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!