🧠

layoutlm-document-qa

by impira Model ID: hf-model--impira--layoutlm-document-qa
FNI 6.4
Top 74%

"This is a fine-tuned version of the multi-modal Layou......"

🔗 View Source
Audited 6.4 FNI Score
Massive 127793412B Params
4k Context
17.3K Downloads
H100+ ~95845062GB Est. VRAM

⚡ Quick Commands

🤗 HF Download
huggingface-cli download impira/layoutlm-document-qa
đŸ“Ļ Install Lib
pip install -U transformers
📊

Engineering Specs

⚡ Hardware

Parameters
0.1B
Architecture
layoutlm
Context Length
4K
Model Size
3.8GB

🧠 Lifecycle

Library
-
Precision
float16
Tokenizer
-

🌐 Identity

Source
HuggingFace
License
Open Access
💾

Est. VRAM Benchmark

~95845061.5GB

Analyze Hardware

* Technical estimation for FP16/Q4 weights. Does not include OS overhead or long-context batching. For Technical Reference Only.

📈 Interest Trend

--

* Real-time activity index across HuggingFace, GitHub and Research citations.

No similar models found.

đŸ”ŦTechnical Deep Dive

Full Specifications [+]
---

🚀 What's Next?

⚡ Quick Commands

🤗 HF Download
huggingface-cli download impira/layoutlm-document-qa
đŸ“Ļ Install Lib
pip install -U transformers
đŸ–Ĩī¸

Hardware Compatibility

Multi-Tier Validation Matrix

Live Sync
🎮 Compatible

RTX 3060 / 4060 Ti

Entry 8GB VRAM
🎮 Compatible

RTX 4070 Super

Mid 12GB VRAM
đŸ’ģ Compatible

RTX 4080 / Mac M3

High 16GB VRAM
🚀 Compatible

RTX 3090 / 4090

Pro 24GB VRAM
đŸ—ī¸ Compatible

RTX 6000 Ada

Workstation 48GB VRAM
🏭 Compatible

A100 / H100

Datacenter 80GB VRAM
â„šī¸

Pro Tip: Compatibility is estimated for 4-bit quantization (Q4). High-precision (FP16) or ultra-long context windows will significantly increase VRAM requirements.

README

LayoutLM for Visual Question Answering

This is a fine-tuned version of the multi-modal LayoutLM model for the task of question answering on documents. It has been fine-tuned using both the SQuAD2.0 and DocVQA datasets.

Getting started with the model

To run these examples, you must have PIL, pytesseract, and PyTorch installed in addition to transformers.

from transformers import pipeline

nlp = pipeline(
    "document-question-answering",
    model="impira/layoutlm-document-qa",
)

nlp(
    "https://templates.invoicehome.com/invoice-template-us-neat-750px.png",
    "What is the invoice number?"
)
# {'score': 0.9943977, 'answer': 'us-001', 'start': 15, 'end': 15}

nlp(
    "https://miro.medium.com/max/787/1*iECQRIiOGTmEFLdWkVIH2g.jpeg",
    "What is the purchase amount?"
)
# {'score': 0.9912159, 'answer': '$1,000,000,000', 'start': 97, 'end': 97}

nlp(
    "https://www.accountingcoach.com/wp-content/uploads/2013/10/[email protected]",
    "What are the 2020 net sales?"
)
# {'score': 0.59147286, 'answer': '$ 3,750', 'start': 19, 'end': 20}

NOTE: This model and pipeline was recently landed in transformers via PR #18407 and PR #18414, so you'll need to use a recent version of transformers, for example:

pip install git+https://github.com/huggingface/transformers.git@2ef774211733f0acf8d3415f9284c49ef219e991

About us

This model was created by the team at Impira.

ZEN MODE â€ĸ README

LayoutLM for Visual Question Answering

This is a fine-tuned version of the multi-modal LayoutLM model for the task of question answering on documents. It has been fine-tuned using both the SQuAD2.0 and DocVQA datasets.

Getting started with the model

To run these examples, you must have PIL, pytesseract, and PyTorch installed in addition to transformers.

from transformers import pipeline

nlp = pipeline(
    "document-question-answering",
    model="impira/layoutlm-document-qa",
)

nlp(
    "https://templates.invoicehome.com/invoice-template-us-neat-750px.png",
    "What is the invoice number?"
)
# {'score': 0.9943977, 'answer': 'us-001', 'start': 15, 'end': 15}

nlp(
    "https://miro.medium.com/max/787/1*iECQRIiOGTmEFLdWkVIH2g.jpeg",
    "What is the purchase amount?"
)
# {'score': 0.9912159, 'answer': '$1,000,000,000', 'start': 97, 'end': 97}

nlp(
    "https://www.accountingcoach.com/wp-content/uploads/2013/10/[email protected]",
    "What are the 2020 net sales?"
)
# {'score': 0.59147286, 'answer': '$ 3,750', 'start': 19, 'end': 20}

NOTE: This model and pipeline was recently landed in transformers via PR #18407 and PR #18414, so you'll need to use a recent version of transformers, for example:

pip install git+https://github.com/huggingface/transformers.git@2ef774211733f0acf8d3415f9284c49ef219e991

About us

This model was created by the team at Impira.

📝 Limitations & Considerations

  • â€ĸ Benchmark scores may vary based on evaluation methodology and hardware configuration.
  • â€ĸ VRAM requirements are estimates; actual usage depends on quantization and batch size.
  • â€ĸ FNI scores are relative rankings and may change as new models are added.
  • ⚠ License Unknown: Verify licensing terms before commercial use.
  • â€ĸ Source: Unknown
📜

Cite this model

Academic & Research Attribution

BibTeX
@misc{hf_model__impira__layoutlm_document_qa,
  author = {impira},
  title = {undefined Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/impira/layoutlm-document-qa}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
impira. (2026). undefined [Model]. Free2AITools. https://huggingface.co/impira/layoutlm-document-qa
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Model Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id
hf-model--impira--layoutlm-document-qa
author
impira
tags
transformerspytorchtfsafetensorslayoutlmdocument-question-answeringpdfenlicense:mitendpoints_compatibleregion:us

âš™ī¸ Technical Specs

architecture
layoutlm
params billions
127,793,412
context length
4,096
vram gb
95,845,061.5
vram is estimated
true
vram formula
VRAM ≈ (params * 0.75) + 2GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

likes
1,152
downloads
17,295

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)