🧠 Model

OmniParser

by microsoft

--- library_name: transformers license: mit pipeline_tag: image-text-to-text --- 📢 [Project Page] [Blog Post] [Demo] OmniParser is a general screen parsing too

🕐 Updated 12/19/2025

🧠 Architecture Explorer

Neural network architecture

1 Input Layer

2 Hidden Layers

3 Attention

4 Output Layer

Learn about Transformers →

About

📢 [Project Page] [Blog Post] [Demo] OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent. Training Datasets include: 1) an interactable icon detection dataset, which was curated from popular web pages and automatically annotated to highlight clickable and actionable regions, and 2) an icon description dataset, designed t...

📝 Limitations & Considerations

• Benchmark scores may vary based on evaluation methodology and hardware configuration.
• VRAM requirements are estimates; actual usage depends on quantization and batch size.
• FNI scores are relative rankings and may change as new models are added.
• Data source: [{"source_platform":"huggingface","source_url":"https://huggingface.co/microsoft/OmniParser","fetched_at":"2025-12-19T07:41:01.176Z","adapter_version":"3.2.0"}]

📚 Related Resources

📄 Related Papers

No related papers linked yet. Check the model's official documentation for research papers.

📊 Training Datasets

Training data information not available. Refer to the original model card for details.

🔗 Related Models V6.2

phi-2📦 Sibling phi-4📦 Sibling VibeVoice-1.5B📦 Sibling Florence-2-large📦 Sibling Phi-3-mini-128k-instruct📦 Sibling

Model Information Summary
Model Name	OmniParser
Author	microsoft
Type	image-text-to-text
Downloads	472
Likes	1,697
Source	Hugging Face
Last Updated	December 19, 2025

Graph Overview

200 Models

460 Connections

Explore Full Graph →

🚀 What's Next?

📊