ernie-4.5-21b-a3b-thinking
⥠Quick Commands
ollama run ernie-4.5-21b-a3b-thinking huggingface-cli download baidu/ernie-4.5-21b-a3b-thinking pip install -U transformers Engineering Specs
⥠Hardware
đ§ Lifecycle
đ Identity
Est. VRAM Benchmark
~17.7GB
* Technical estimation for FP16/Q4 weights. Does not include OS overhead or long-context batching. For Technical Reference Only.
đ Interest Trend
Real-time Trend Indexing In-Progress
* Real-time activity index across HuggingFace, GitHub and Research citations.
No similar models found.
Social Proof
đŦTechnical Deep Dive
Full Specifications [+]âž
đ What's Next?
đŧī¸ Visual Gallery
1 Images Detected
⥠Quick Commands
ollama run ernie-4.5-21b-a3b-thinking huggingface-cli download baidu/ernie-4.5-21b-a3b-thinking pip install -U transformers Hardware Compatibility
Multi-Tier Validation Matrix
RTX 3060 / 4060 Ti
RTX 4070 Super
RTX 4080 / Mac M3
RTX 3090 / 4090
RTX 6000 Ada
A100 / H100
Pro Tip: Compatibility is estimated for 4-bit quantization (Q4). High-precision (FP16) or ultra-long context windows will significantly increase VRAM requirements.
README
ERNIE-4.5-21B-A3B-Thinking
Model Highlights
Over the past three months, we have continued to scale the thinking capability of ERNIE-4.5-21B-A3B, improving both the quality and depth of reasoning, thereby advancing the competitiveness of ERNIE lightweight models in complex reasoning tasks. We are pleased to introduce ERNIE-4.5-21B-A3B-Thinking, featuring the following key enhancements:
- Significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, text generation, and academic benchmarks that typically require human expertise.
- Efficient tool usage capabilities.
- Enhanced 128K long-context understanding capabilities.
[!NOTE] Note: This version has an increased thinking length. We strongly recommend its use in highly complex reasoning tasks.

Model Overview
ERNIE-4.5-21B-A3B-Thinking is a text MoE post-trained model, with 21B total parameters and 3B activated parameters for each token. The following are the model configuration details:
| Key | Value |
|---|---|
| Modality | Text |
| Training Stage | Posttraining |
| Params(Total / Activated) | 21B / 3B |
| Layers | 28 |
| Heads(Q/KV) | 20 / 4 |
| Text Experts(Total / Activated) | 64 / 6 |
| Shared Experts | 2 |
| Context Length | 131072 |
Quickstart
[!NOTE] To align with the wider community, this model releases Transformer-style weights. Both PyTorch and PaddlePaddle ecosystem tools, such as vLLM, transformers, and FastDeploy, are expected to be able to load and run this model.
FastDeploy Inference
Quickly deploy services using FastDeploy as shown below. For more detailed usage, refer to the FastDeploy GitHub Repository.
Note: 80GB x 1 GPU resources are required. Deploying this model requires FastDeploy version 2.2.
python -m fastdeploy.entrypoints.openai.api_server \
--model baidu/ERNIE-4.5-21B-A3B-Thinking \
--port 8180 \
--metrics-port 8181 \
--engine-worker-queue-port 8182 \
--load-choices "default_v1" \
--tensor-parallel-size 1 \
--max-model-len 131072 \
--reasoning-parser ernie_x1 \
--tool-call-parser ernie_x1 \
--max-num-seqs 32
The ERNIE-4.5-21B-A3B-Thinking model supports function call.
curl -X POST "http://0.0.0.0:8180/v1/chat/completions" \
-H "Content-Type: application/json" \
-d $'{
"messages": [
{
"role": "user",
"content": "How \'s the weather in Beijing today?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Determine weather in my location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": [
"c",
"f"
]
}
},
"additionalProperties": false,
"required": [
"location",
"unit"
]
},
"strict": true
}
}]
}'
vLLM inference
VLLM>=0.10.2 (excluding 0.11.0)
vllm serve baidu/ERNIE-4.5-21B-A3B-Thinking
The reasoning-parser and tool-call-parser for vLLM Ernie need install vllm main branch
Using `transformers` library
Note: You'll need thetransformerslibrary (version 4.54.0 or newer) installed to use this model.
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "baidu/ERNIE-4.5-21B-A3B-Thinking"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16,
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# decode the generated ids
generate_text = tokenizer.decode(output_ids, skip_special_tokens=True)
print("generate_text:", generate_text)
License
The ERNIE 4.5 models are provided under the Apache License 2.0. This license permits commercial use, subject to its terms and conditions. Copyright (c) 2025 Baidu, Inc. All Rights Reserved.
Citation
If you find ERNIE 4.5 useful or wish to use it in your projects, please kindly cite our technical report:
@misc{ernie2025technicalreport,
title={ERNIE 4.5 Technical Report},
author={Baidu-ERNIE-Team},
year={2025},
primaryClass={cs.CL},
howpublished={\url{https://ernie.baidu.com/blog/publication/ERNIE_Technical_Report.pdf}}
}
7,077 chars âĸ Full Disclosure Protocol Active
ERNIE-4.5-21B-A3B-Thinking
Model Highlights
Over the past three months, we have continued to scale the thinking capability of ERNIE-4.5-21B-A3B, improving both the quality and depth of reasoning, thereby advancing the competitiveness of ERNIE lightweight models in complex reasoning tasks. We are pleased to introduce ERNIE-4.5-21B-A3B-Thinking, featuring the following key enhancements:
- Significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, text generation, and academic benchmarks that typically require human expertise.
- Efficient tool usage capabilities.
- Enhanced 128K long-context understanding capabilities.
[!NOTE] Note: This version has an increased thinking length. We strongly recommend its use in highly complex reasoning tasks.

Model Overview
ERNIE-4.5-21B-A3B-Thinking is a text MoE post-trained model, with 21B total parameters and 3B activated parameters for each token. The following are the model configuration details:
| Key | Value |
|---|---|
| Modality | Text |
| Training Stage | Posttraining |
| Params(Total / Activated) | 21B / 3B |
| Layers | 28 |
| Heads(Q/KV) | 20 / 4 |
| Text Experts(Total / Activated) | 64 / 6 |
| Shared Experts | 2 |
| Context Length | 131072 |
Quickstart
[!NOTE] To align with the wider community, this model releases Transformer-style weights. Both PyTorch and PaddlePaddle ecosystem tools, such as vLLM, transformers, and FastDeploy, are expected to be able to load and run this model.
FastDeploy Inference
Quickly deploy services using FastDeploy as shown below. For more detailed usage, refer to the FastDeploy GitHub Repository.
Note: 80GB x 1 GPU resources are required. Deploying this model requires FastDeploy version 2.2.
python -m fastdeploy.entrypoints.openai.api_server \
--model baidu/ERNIE-4.5-21B-A3B-Thinking \
--port 8180 \
--metrics-port 8181 \
--engine-worker-queue-port 8182 \
--load-choices "default_v1" \
--tensor-parallel-size 1 \
--max-model-len 131072 \
--reasoning-parser ernie_x1 \
--tool-call-parser ernie_x1 \
--max-num-seqs 32
The ERNIE-4.5-21B-A3B-Thinking model supports function call.
curl -X POST "http://0.0.0.0:8180/v1/chat/completions" \
-H "Content-Type: application/json" \
-d $'{
"messages": [
{
"role": "user",
"content": "How \'s the weather in Beijing today?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Determine weather in my location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": [
"c",
"f"
]
}
},
"additionalProperties": false,
"required": [
"location",
"unit"
]
},
"strict": true
}
}]
}'
vLLM inference
VLLM>=0.10.2 (excluding 0.11.0)
vllm serve baidu/ERNIE-4.5-21B-A3B-Thinking
The reasoning-parser and tool-call-parser for vLLM Ernie need install vllm main branch
Using `transformers` library
Note: You'll need thetransformerslibrary (version 4.54.0 or newer) installed to use this model.
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "baidu/ERNIE-4.5-21B-A3B-Thinking"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16,
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# decode the generated ids
generate_text = tokenizer.decode(output_ids, skip_special_tokens=True)
print("generate_text:", generate_text)
License
The ERNIE 4.5 models are provided under the Apache License 2.0. This license permits commercial use, subject to its terms and conditions. Copyright (c) 2025 Baidu, Inc. All Rights Reserved.
Citation
If you find ERNIE 4.5 useful or wish to use it in your projects, please kindly cite our technical report:
@misc{ernie2025technicalreport,
title={ERNIE 4.5 Technical Report},
author={Baidu-ERNIE-Team},
year={2025},
primaryClass={cs.CL},
howpublished={\url{https://ernie.baidu.com/blog/publication/ERNIE_Technical_Report.pdf}}
}
đ Limitations & Considerations
- âĸ Benchmark scores may vary based on evaluation methodology and hardware configuration.
- âĸ VRAM requirements are estimates; actual usage depends on quantization and batch size.
- âĸ FNI scores are relative rankings and may change as new models are added.
- â License Unknown: Verify licensing terms before commercial use.
- âĸ Source: Unknown
Cite this model
Academic & Research Attribution
@misc{hf_model__baidu__ernie_4.5_21b_a3b_thinking,
author = {baidu},
title = {undefined Model},
year = {2026},
howpublished = {\url{https://huggingface.co/baidu/ernie-4.5-21b-a3b-thinking}},
note = {Accessed via Free2AITools Knowledge Fortress}
} AI Summary: Based on Hugging Face metadata. Not a recommendation.
đĄī¸ Model Transparency Report
Verified data manifest for traceability and transparency.
đ Identity & Source
- id
- hf-model--baidu--ernie-4.5-21b-a3b-thinking
- author
- baidu
- tags
- transformerssafetensorsernie4_5_moetext-generationernie4.5conversationalenzhlicense:apache-2.0endpoints_compatibleregion:us
âī¸ Technical Specs
- architecture
- Ernie4_5_MoeForCausalLM
- params billions
- 21.83
- context length
- 4,096
- vram gb
- 17.7
- vram is estimated
- true
- vram formula
- VRAM â (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)
đ Engagement & Metrics
- likes
- 769
- downloads
- 14,533
Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)