blip-image-captioning-base
by Salesforce
--- pipeline_tag: image-to-text tags: - image-captioning languages: - en license: bsd-3-clause --- Model card for image captioning pretrained on COCO dataset -
π§ Architecture Explorer
Neural network architecture
About
Model card for image captioning pretrained on COCO dataset - base architecture (with ViT base backbone). | !BLIP.gif | |:--:| | Pull figure from BLIP official repo | Image source: https://github.com/salesforce/BLIP | Authors from the paper write in the abstract: *Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-tra...
π Limitations & Considerations
- β’ Benchmark scores may vary based on evaluation methodology and hardware configuration.
- β’ VRAM requirements are estimates; actual usage depends on quantization and batch size.
- β’ FNI scores are relative rankings and may change as new models are added.
- β’ Data source: [{"source_platform":"huggingface","source_url":"https://huggingface.co/Salesforce/blip-image-captioning-base","fetched_at":"2025-12-19T07:41:01.178Z","adapter_version":"3.2.0"}]
π Related Resources
π Related Papers
No related papers linked yet. Check the model's official documentation for research papers.
π Training Datasets
Training data information not available. Refer to the original model card for details.
π Related Models
Data unavailable