πŸ“Š
Dataset

yanto991

by hapitz hf-dataset--hapitz--yanto991
Nexus Index
33.0 Top 100%
S: Semantic 50
A: Authority 0
P: Popularity 59
R: Recency 50
Q: Quality 30
Tech Context
Vital Performance
0 DL / 30D
0.0%
Data Integrity 33 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--hapitz--yanto991
License MIT
Provider huggingface
πŸ“œ

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__hapitz__yanto991,
  author = {hapitz},
  title = {yanto991 Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/hapitz/yanto991}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
hapitz. (2026). yanto991 [Dataset]. Free2AITools. https://huggingface.co/datasets/hapitz/yanto991

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Nexus Index V2.0

33.0
TOP 100% SYSTEM IMPACT
Semantic (S) 50
Authority (A) 0
Popularity (P) 59
Recency (R) 50
Quality (Q) 30

πŸ’¬ Index Insight

FNI V2.0 for yanto991: Semantic (S:50), Authority (A:0), Popularity (P:59), Recency (R:50), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
⬇️
Downloads
116,746

πŸ‘οΈ Data Preview

πŸ“Š

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

πŸ”— Explore Full Dataset β†—

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

GroundCUA: Grounding Computer Use Agents on Human Demonstrations

🌐 Website | πŸ“‘ Paper | πŸ€— Dataset | πŸ€– Models

GroundCUA Overview

GroundCUA Dataset

GroundCUA is a large and diverse dataset of real UI screenshots paired with structured annotations for building multimodal computer use agents. It covers 87 software platforms across productivity tools, browsers, creative tools, communication apps, development environments, and system utilities. GroundCUA is designed for research on GUI grounding, UI perception, and vision-language-action models that interact with computers.


Highlights

  • 87 platforms spanning Windows, macOS, Linux, and cross-platform apps
  • Annotated UI elements with bounding boxes, text, and coarse semantic categories
  • SHA-256 file pairing between screenshots and JSON annotations
  • Supports research on GUI grounding, multimodal agents, and UI understanding
  • MIT license for broad academic and open source use

Dataset Structure

text
GroundCUA/
β”œβ”€β”€ data/              # JSON annotation files
β”œβ”€β”€ images/            # Screenshot images
└── README.md

Directory Layout

Each platform appears as a directory name inside both data/ and images/.

  • data/PlatformName/ contains annotation JSON files
  • images/PlatformName/ contains corresponding PNG screenshots

Image and annotation files share the same SHA-256 hash.


File Naming Convention

Each screenshot has a matching annotation file using the same hash:

  • data/PlatformName/[hash].json
  • images/PlatformName/[hash].png

This structure ensures:

  • Unique identifiers for each screenshot
  • Easy pairing between images and annotations
  • Compatibility with pipelines that expect hash-based addressing

Annotation Format

Each annotation file is a list of UI element entries describing visible elements in the screenshot.

json
[
  {
    "image_path": "PlatformName/screenshot_hash.png",
    "bbox": [x1, y1, x2, y2],
    "text": "UI element text",
    "category": "Element category",
    "id": "unique-id"
  }
]

Field Descriptions

image_path
Relative path to the screenshot.

bbox
Bounding box coordinates [x1, y1, x2, y2] in pixel space.

text
Visible text or a short description of the element.

category
Coarse UI type label. Present only for some elements.

id
Unique identifier for the annotation entry.


UI Element Categories

Categories are approximate and not guaranteed for all elements. Examples include:

  • Button
  • Menu
  • Input Elements
  • Navigation
  • Sidebar
  • Visual Elements
  • Information Display
  • Others

These labels provide light structure for UI grounding tasks but do not form a full ontology.


Example Use Cases

GroundCUA can be used for:

  • Training computer use agents to perceive and understand UI layouts
  • Building GUI grounding modules for VLA agents
  • Pretraining screen parsing and UI element detectors
  • Benchmarking OCR, layout analysis, and cross-platform UI parsing
  • Developing models that map UI regions to natural language or actions

Citation

If you use GroundCUA in your research, please cite our work:

bibtex
@misc{feizi2025groundingcomputeruseagents,
      title={Grounding Computer Use Agents on Human Demonstrations}, 
      author={Aarash Feizi and Shravan Nayak and Xiangru Jian and Kevin Qinghong Lin and Kaixin Li and Rabiul Awal and Xing Han LΓΉ and Johan Obando-Ceron and Juan A. Rodriguez and Nicolas Chapados and David Vazquez and Adriana Romero-Soriano and Reihaneh Rabbany and Perouz Taslakian and Christopher Pal and Spandana Gella and Sai Rajeswar},
      year={2025},
      eprint={2511.07332},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.07332}, 
}

License

GroundCUA is released under the MIT License.
Users are responsible for ensuring compliance with all applicable laws and policies.

Social Proof

HuggingFace Hub
116.7KDownloads
πŸ”„ Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

πŸ†” Identity & Source

id
hf-dataset--hapitz--yanto991
slug
hapitz--yanto991
source
huggingface
author
hapitz
license
MIT
tags
task_categories:image-to-text, language:en, license:mit, size_categories:1m<n<10m, modality:image, arxiv:2511.07332, region:us, computer_use, agents, grounding, multimodal, ui-vision, groundcua

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

πŸ“Š Engagement & Metrics

downloads
116,746
stars
0
forks
0

Data indexed from public sources. Updated daily.