🧠 Model

distilbert-base-uncased

by distilbert

--- language: en tags: - exbert license: apache-2.0 datasets: - bookcorpus - wikipedia --- This model is a distilled version of the BERT base model. It was introduced in this paper. The code for the distillation process can be found here. This model is uncased: it does not make a difference between ...

πŸ• Updated 12/18/2025

🧠 Architecture Explorer

Neural network architecture

1 Input Layer
2 Hidden Layers
3 Attention
4 Output Layer

About

This model is a distilled version of the BERT base model. It was introduced in this paper. The code for the distillation process can be found here. This model is uncased: it does not make a difference between english and English. DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a self-supervised fashion, using the BERT base model as a teacher. ...

πŸ“ Limitations & Considerations

  • β€’ Benchmark scores may vary based on evaluation methodology and hardware configuration.
  • β€’ VRAM requirements are estimates; actual usage depends on quantization and batch size.
  • β€’ FNI scores are relative rankings and may change as new models are added.
  • β€’ Data source: [{"source_platform":"huggingface","source_url":"https://huggingface.co/distilbert/distilbert-base-uncased","fetched_at":"2025-12-18T04:21:59.006Z","adapter_version":"3.2.0"}]

πŸ“š Related Resources

πŸ“„ Related Papers

No related papers linked yet. Check the model's official documentation for research papers.

πŸ“Š Training Datasets

Training data information not available. Refer to the original model card for details.

πŸ”— Related Models

Data unavailable

πŸš€ What's Next?