🧠 Model

OmniParser

by microsoft

--- library_name: transformers license: mit pipeline_tag: image-text-to-text --- πŸ“’ [Project Page] [Blog Post] [Demo] OmniParser is a general screen parsing too

πŸ• Updated 12/19/2025

🧠 Architecture Explorer

Neural network architecture

1 Input Layer
2 Hidden Layers
3 Attention
4 Output Layer

About

πŸ“’ [Project Page] [Blog Post] [Demo] OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent. Training Datasets include: 1) an interactable icon detection dataset, which was curated from popular web pages and automatically annotated to highlight clickable and actionable regions, and 2) an icon description dataset, designed t...

πŸ“ Limitations & Considerations

  • β€’ Benchmark scores may vary based on evaluation methodology and hardware configuration.
  • β€’ VRAM requirements are estimates; actual usage depends on quantization and batch size.
  • β€’ FNI scores are relative rankings and may change as new models are added.
  • β€’ Data source: [{"source_platform":"huggingface","source_url":"https://huggingface.co/microsoft/OmniParser","fetched_at":"2025-12-19T07:41:01.176Z","adapter_version":"3.2.0"}]

πŸ“š Related Resources

πŸ“„ Related Papers

No related papers linked yet. Check the model's official documentation for research papers.

πŸ“Š Training Datasets

Training data information not available. Refer to the original model card for details.

πŸ”— Related Models V6.2

πŸš€ What's Next?