OmniParser
by microsoft
--- library_name: transformers license: mit pipeline_tag: image-text-to-text --- π’ [Project Page] [Blog Post] [Demo] OmniParser is a general screen parsing too
π§ Architecture Explorer
Neural network architecture
About
π’ [Project Page] [Blog Post] [Demo] OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent. Training Datasets include: 1) an interactable icon detection dataset, which was curated from popular web pages and automatically annotated to highlight clickable and actionable regions, and 2) an icon description dataset, designed t...
π Limitations & Considerations
- β’ Benchmark scores may vary based on evaluation methodology and hardware configuration.
- β’ VRAM requirements are estimates; actual usage depends on quantization and batch size.
- β’ FNI scores are relative rankings and may change as new models are added.
- β’ Data source: [{"source_platform":"huggingface","source_url":"https://huggingface.co/microsoft/OmniParser","fetched_at":"2025-12-19T07:41:01.176Z","adapter_version":"3.2.0"}]
π Related Resources
π Related Papers
No related papers linked yet. Check the model's official documentation for research papers.
π Training Datasets
Training data information not available. Refer to the original model card for details.