Model

Got Ocr2 0

by Stepfun Ai ID: hf-model--stepfun-ai--got-ocr2_0
Scale 716033280B
FNI Rank 31
Percentile Top 5%
Activity
β†’ 0.0%

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model πŸ”‹Online Demo | 🌟GitHub | πŸ“œPaper Haoran Wei*, Chenglong Liu*, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming Xu, Zheng Ge, Liang Zhao, Jianjian Sun, Yuang Peng, Chunrui Han, Xiangyu Zhang !image/jpeg Inference using Huggingface transf...

Audited 31 FNI Score
Massive 716033280B Params
4k Context
16.1K Downloads
H100+ ~537024963GB Est. VRAM
Model Information Summary
Entity Passport
Registry ID hf-model--stepfun-ai--got-ocr2_0
Provider huggingface
πŸ’Ύ

Compute Threshold

~537024962.5GB VRAM

Interactive
Analyze Hardware
β–Ό

* Estimated for 4-Bit Quantization. Actual usage varies by context length and parallel batching.

πŸ•ΈοΈ Neural Mesh Hub

Interconnecting Research, Data & Ecosystem

πŸ•ΈοΈ

Intelligence Hive

Multi-source Relation Matrix

Live Index
πŸ“œ

Cite this model

Academic & Research Attribution

BibTeX
@misc{hf_model__stepfun_ai__got_ocr2_0,
  author = {Stepfun Ai},
  title = {Got Ocr2 0 Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/stepfun-ai/GOT-OCR2_0}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
Stepfun Ai. (2026). Got Ocr2 0 [Model]. Free2AITools. https://huggingface.co/stepfun-ai/GOT-OCR2_0

πŸ”¬Technical Deep Dive

Full Specifications [+]

⚑ Quick Commands

πŸ€— HF Download
huggingface-cli download stepfun-ai/got-ocr2_0

βš–οΈ Free2AI Nexus Index

Methodology β†’ πŸ“˜ What is FNI?
31.0
Top 5% Overall Impact
πŸ”₯ Popularity (P) 0
πŸš€ Velocity (V) 0
πŸ›‘οΈ Credibility (C) 0
πŸ”§ Utility (U) 0
Nexus Verified Data

πŸ’¬ Why this score?

This Got Ocr2 0 has a P score of 0 (popularity from downloads/likes), V of 0 (growth velocity), C of 0 (credibility from citations), and U of 0 (utility/deploy support).

Data Verified πŸ• Last Updated: Not calculated
Free2AI Nexus Index | Fair Β· Transparent Β· Explainable | Full Methodology
---

πŸš€ What's Next?

README


pipeline_tag: image-text-to-text
language:

  • multilingual
    tags:
  • got
  • vision-language
  • ocr2.0
  • custom_code
    license: apache-2.0

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

πŸ”‹Online Demo | 🌟GitHub | πŸ“œPaper

Haoran Wei*, Chenglong Liu*, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming Xu, Zheng Ge, Liang Zhao, Jianjian Sun, Yuang Peng, Chunrui Han, Xiangyu Zhang

image/jpeg

Usage

Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.10:

torch==2.0.1
torchvision==0.15.2
transformers==4.37.2
tiktoken==0.6.0
verovio==4.3.1
accelerate==0.28.0
from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True)
model = AutoModel.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True, low_cpu_mem_usage=True, device_map='cuda', use_safetensors=True, pad_token_id=tokenizer.eos_token_id)
model = model.eval().cuda()


# input your test image
image_file = 'xxx.jpg'

# plain texts OCR
res = model.chat(tokenizer, image_file, ocr_type='ocr')

# format texts OCR:
# res = model.chat(tokenizer, image_file, ocr_type='format')

# fine-grained OCR:
# res = model.chat(tokenizer, image_file, ocr_type='ocr', ocr_box='')
# res = model.chat(tokenizer, image_file, ocr_type='format', ocr_box='')
# res = model.chat(tokenizer, image_file, ocr_type='ocr', ocr_color='')
# res = model.chat(tokenizer, image_file, ocr_type='format', ocr_color='')

# multi-crop OCR:
# res = model.chat_crop(tokenizer, image_file, ocr_type='ocr')
# res = model.chat_crop(tokenizer, image_file, ocr_type='format')

# render the formatted OCR results:
# res = model.chat(tokenizer, image_file, ocr_type='format', render=True, save_render_file = './demo.html')

print(res)

More details about 'ocr_type', 'ocr_box', 'ocr_color', and 'render' can be found at our GitHub.
Our training codes are available at our GitHub.

More Multimodal Projects

πŸ‘ Welcome to explore more multimodal projects of our team:

Vary | Fox | OneChart

Citation

If you find our work helpful, please consider citing our papers πŸ“ and liking this project ❀️!

@article{wei2024general,
  title={General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model},
  author={Wei, Haoran and Liu, Chenglong and Chen, Jinyue and Wang, Jia and Kong, Lingyu and Xu, Yanming and Ge, Zheng and Zhao, Liang and Sun, Jianjian and Peng, Yuang and others},
  journal={arXiv preprint arXiv:2409.01704},
  year={2024}
}
@article{liu2024focus,
  title={Focus Anywhere for Fine-grained Multi-page Document Understanding},
  author={Liu, Chenglong and Wei, Haoran and Chen, Jinyue and Kong, Lingyu and Ge, Zheng and Zhu, Zining and Zhao, Liang and Sun, Jianjian and Han, Chunrui and Zhang, Xiangyu},
  journal={arXiv preprint arXiv:2405.14295},
  year={2024}
}
@article{wei2023vary,
  title={Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models},
  author={Wei, Haoran and Kong, Lingyu and Chen, Jinyue and Zhao, Liang and Ge, Zheng and Yang, Jinrong and Sun, Jianjian and Han, Chunrui and Zhang, Xiangyu},
  journal={arXiv preprint arXiv:2312.06109},
  year={2023}
}
ZEN MODE β€’ README

pipeline_tag: image-text-to-text
language:

  • multilingual
    tags:
  • got
  • vision-language
  • ocr2.0
  • custom_code
    license: apache-2.0

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

πŸ”‹Online Demo | 🌟GitHub | πŸ“œPaper

Haoran Wei*, Chenglong Liu*, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming Xu, Zheng Ge, Liang Zhao, Jianjian Sun, Yuang Peng, Chunrui Han, Xiangyu Zhang

image/jpeg

Usage

Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.10:

torch==2.0.1
torchvision==0.15.2
transformers==4.37.2
tiktoken==0.6.0
verovio==4.3.1
accelerate==0.28.0
from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True) model = AutoModel.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True, low_cpu_mem_usage=True, device_map='cuda', use_safetensors=True, pad_token_id=tokenizer.eos_token_id) model = model.eval().cuda()

input your test image

image_file = 'xxx.jpg'

plain texts OCR

res = model.chat(tokenizer, image_file, ocr_type='ocr')

format texts OCR:

res = model.chat(tokenizer, image_file, ocr_type='format')

fine-grained OCR:

res = model.chat(tokenizer, image_file, ocr_type='ocr', ocr_box='')

res = model.chat(tokenizer, image_file, ocr_type='format', ocr_box='')

res = model.chat(tokenizer, image_file, ocr_type='ocr', ocr_color='')

res = model.chat(tokenizer, image_file, ocr_type='format', ocr_color='')

multi-crop OCR:

res = model.chat_crop(tokenizer, image_file, ocr_type='ocr')

res = model.chat_crop(tokenizer, image_file, ocr_type='format')

render the formatted OCR results:

res = model.chat(tokenizer, image_file, ocr_type='format', render=True, save_render_file = './demo.html')

print(res)

More details about 'ocr_type', 'ocr_box', 'ocr_color', and 'render' can be found at our GitHub.
Our training codes are available at our GitHub.

More Multimodal Projects

πŸ‘ Welcome to explore more multimodal projects of our team:

Vary | Fox | OneChart

Citation

If you find our work helpful, please consider citing our papers πŸ“ and liking this project ❀️!

@article{wei2024general,
  title={General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model},
  author={Wei, Haoran and Liu, Chenglong and Chen, Jinyue and Wang, Jia and Kong, Lingyu and Xu, Yanming and Ge, Zheng and Zhao, Liang and Sun, Jianjian and Peng, Yuang and others},
  journal={arXiv preprint arXiv:2409.01704},
  year={2024}
}
@article{liu2024focus,
  title={Focus Anywhere for Fine-grained Multi-page Document Understanding},
  author={Liu, Chenglong and Wei, Haoran and Chen, Jinyue and Kong, Lingyu and Ge, Zheng and Zhu, Zining and Zhao, Liang and Sun, Jianjian and Han, Chunrui and Zhang, Xiangyu},
  journal={arXiv preprint arXiv:2405.14295},
  year={2024}
}
@article{wei2023vary,
  title={Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models},
  author={Wei, Haoran and Kong, Lingyu and Chen, Jinyue and Zhao, Liang and Ge, Zheng and Yang, Jinrong and Sun, Jianjian and Han, Chunrui and Zhang, Xiangyu},
  journal={arXiv preprint arXiv:2312.06109},
  year={2023}
}

πŸ“ Limitations & Considerations

  • β€’ Benchmark scores may vary based on evaluation methodology and hardware configuration.
  • β€’ VRAM requirements are estimates; actual usage depends on quantization and batch size.
  • β€’ FNI scores are relative rankings and may change as new models are added.
  • β€’ Source: Unknown
Top Tier

Social Proof

HuggingFace Hub
1.5KLikes
16.1KDownloads
πŸ”„ Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Model Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

πŸ†” Identity & Source

id
hf-model--stepfun-ai--got-ocr2_0
source
huggingface
author
Stepfun Ai
tags
safetensorsgotvision-languageocr2.0custom_codeimage-text-to-textmultilingualarxiv:2409.01704arxiv:2405.14295arxiv:2312.06109license:apache-2.0region:us

βš™οΈ Technical Specs

architecture
GOT
params billions
716,033,280
context length
4,096
pipeline tag
image-text-to-text
vram gb
537,024,962.5
vram is estimated
true
vram formula
VRAM β‰ˆ (params * 0.75) + 2GB (KV) + 0.5GB (OS)

πŸ“Š Engagement & Metrics

likes
1,526
downloads
16,073

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)