🧠

ernie-4.5-21b-a3b-thinking

by baidu Model ID: hf-model--baidu--ernie-4.5-21b-a3b-thinking
FNI 7.4
Top 61%
🔗 View Source
Audited 7.4 FNI Score
21.83B Params
4k Context
14.5K Downloads
24G GPU ~18GB Est. VRAM

⚡ Quick Commands

đŸĻ™ Ollama Run
ollama run ernie-4.5-21b-a3b-thinking
🤗 HF Download
huggingface-cli download baidu/ernie-4.5-21b-a3b-thinking
đŸ“Ļ Install Lib
pip install -U transformers
📊

Engineering Specs

⚡ Hardware

Parameters
21.83B
Architecture
Ernie4_5_MoeForCausalLM
Context Length
4K
Model Size
40.7GB

🧠 Lifecycle

Library
-
Precision
float16
Tokenizer
-

🌐 Identity

Source
HuggingFace
License
Open Access
💾

Est. VRAM Benchmark

~17.7GB

Analyze Hardware

* Technical estimation for FP16/Q4 weights. Does not include OS overhead or long-context batching. For Technical Reference Only.

📈 Interest Trend

--

* Real-time activity index across HuggingFace, GitHub and Research citations.

No similar models found.

đŸ”ŦTechnical Deep Dive

Full Specifications [+]
---

🚀 What's Next?

⚡ Quick Commands

đŸĻ™ Ollama Run
ollama run ernie-4.5-21b-a3b-thinking
🤗 HF Download
huggingface-cli download baidu/ernie-4.5-21b-a3b-thinking
đŸ“Ļ Install Lib
pip install -U transformers
đŸ–Ĩī¸

Hardware Compatibility

Multi-Tier Validation Matrix

Live Sync
🎮 Compatible

RTX 3060 / 4060 Ti

Entry 8GB VRAM
🎮 Compatible

RTX 4070 Super

Mid 12GB VRAM
đŸ’ģ Compatible

RTX 4080 / Mac M3

High 16GB VRAM
🚀 Compatible

RTX 3090 / 4090

Pro 24GB VRAM
đŸ—ī¸ Compatible

RTX 6000 Ada

Workstation 48GB VRAM
🏭 Compatible

A100 / H100

Datacenter 80GB VRAM
â„šī¸

Pro Tip: Compatibility is estimated for 4-bit quantization (Q4). High-precision (FP16) or ultra-long context windows will significantly increase VRAM requirements.

README

7,077 chars â€ĸ Full Disclosure Protocol Active

ZEN MODE â€ĸ README

ERNIE-4.5-21B-A3B-Thinking

Model Highlights

Over the past three months, we have continued to scale the thinking capability of ERNIE-4.5-21B-A3B, improving both the quality and depth of reasoning, thereby advancing the competitiveness of ERNIE lightweight models in complex reasoning tasks. We are pleased to introduce ERNIE-4.5-21B-A3B-Thinking, featuring the following key enhancements:

  • Significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, text generation, and academic benchmarks that typically require human expertise.
  • Efficient tool usage capabilities.
  • Enhanced 128K long-context understanding capabilities.

[!NOTE] Note: This version has an increased thinking length. We strongly recommend its use in highly complex reasoning tasks.

benchmark

Model Overview

ERNIE-4.5-21B-A3B-Thinking is a text MoE post-trained model, with 21B total parameters and 3B activated parameters for each token. The following are the model configuration details:

Key Value
Modality Text
Training Stage Posttraining
Params(Total / Activated) 21B / 3B
Layers 28
Heads(Q/KV) 20 / 4
Text Experts(Total / Activated) 64 / 6
Shared Experts 2
Context Length 131072

Quickstart

[!NOTE] To align with the wider community, this model releases Transformer-style weights. Both PyTorch and PaddlePaddle ecosystem tools, such as vLLM, transformers, and FastDeploy, are expected to be able to load and run this model.

FastDeploy Inference

Quickly deploy services using FastDeploy as shown below. For more detailed usage, refer to the FastDeploy GitHub Repository.

Note: 80GB x 1 GPU resources are required. Deploying this model requires FastDeploy version 2.2.

python -m fastdeploy.entrypoints.openai.api_server \
       --model baidu/ERNIE-4.5-21B-A3B-Thinking \
       --port 8180 \
       --metrics-port 8181 \
       --engine-worker-queue-port 8182 \
       --load-choices "default_v1" \
       --tensor-parallel-size 1 \
       --max-model-len 131072 \
       --reasoning-parser ernie_x1 \
       --tool-call-parser ernie_x1 \
       --max-num-seqs 32

The ERNIE-4.5-21B-A3B-Thinking model supports function call.

curl -X POST "http://0.0.0.0:8180/v1/chat/completions" \
-H "Content-Type: application/json" \
-d $'{
  "messages": [
    {
      "role": "user",
      "content": "How \'s the weather in Beijing today?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Determine weather in my location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": [
                "c",
                "f"
              ]
            }
          },
          "additionalProperties": false,
          "required": [
            "location",
            "unit"
          ]
        },
        "strict": true
      }
    }]
}'

vLLM inference

VLLM>=0.10.2 (excluding 0.11.0)

vllm serve baidu/ERNIE-4.5-21B-A3B-Thinking

The reasoning-parser and tool-call-parser for vLLM Ernie need install vllm main branch

Using `transformers` library

Note: You'll need thetransformerslibrary (version 4.54.0 or newer) installed to use this model.

The following contains a code snippet illustrating how to use the model generate content based on given inputs.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "baidu/ERNIE-4.5-21B-A3B-Thinking"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# decode the generated ids
generate_text = tokenizer.decode(output_ids, skip_special_tokens=True)
print("generate_text:", generate_text)

License

The ERNIE 4.5 models are provided under the Apache License 2.0. This license permits commercial use, subject to its terms and conditions. Copyright (c) 2025 Baidu, Inc. All Rights Reserved.

Citation

If you find ERNIE 4.5 useful or wish to use it in your projects, please kindly cite our technical report:

@misc{ernie2025technicalreport,
      title={ERNIE 4.5 Technical Report},
      author={Baidu-ERNIE-Team},
      year={2025},
      primaryClass={cs.CL},
      howpublished={\url{https://ernie.baidu.com/blog/publication/ERNIE_Technical_Report.pdf}}
}

📝 Limitations & Considerations

  • â€ĸ Benchmark scores may vary based on evaluation methodology and hardware configuration.
  • â€ĸ VRAM requirements are estimates; actual usage depends on quantization and batch size.
  • â€ĸ FNI scores are relative rankings and may change as new models are added.
  • ⚠ License Unknown: Verify licensing terms before commercial use.
  • â€ĸ Source: Unknown
📜

Cite this model

Academic & Research Attribution

BibTeX
@misc{hf_model__baidu__ernie_4.5_21b_a3b_thinking,
  author = {baidu},
  title = {undefined Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/baidu/ernie-4.5-21b-a3b-thinking}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
baidu. (2026). undefined [Model]. Free2AITools. https://huggingface.co/baidu/ernie-4.5-21b-a3b-thinking
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Model Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id
hf-model--baidu--ernie-4.5-21b-a3b-thinking
author
baidu
tags
transformerssafetensorsernie4_5_moetext-generationernie4.5conversationalenzhlicense:apache-2.0endpoints_compatibleregion:us

âš™ī¸ Technical Specs

architecture
Ernie4_5_MoeForCausalLM
params billions
21.83
context length
4,096
vram gb
17.7
vram is estimated
true
vram formula
VRAM ≈ (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

likes
769
downloads
14,533

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)