🧠

ernie-4.5-21b-a3b-thinking

Name: ernie-4.5-21b-a3b-thinking
Author: baidu

by baidu Model ID: hf-model--baidu--ernie-4.5-21b-a3b-thinking

FNI 7.4

Top 61%

🔗 View Source

Audited 7.4 FNI Score

21.83B Params

4k Context

14.5K Downloads

24G GPU ~18GB Est. VRAM

⚡ Quick Commands

🦙 Ollama Run

ollama run ernie-4.5-21b-a3b-thinking

🤗 HF Download

huggingface-cli download baidu/ernie-4.5-21b-a3b-thinking

📦 Install Lib

pip install -U transformers

📊

Engineering Specs

V16.2 Platform Optimized

⚡ Hardware

Parameters

21.83B

Architecture

Ernie4_5_MoeForCausalLM

Context Length

Model Size

40.7GB

🧠 Lifecycle

Library

Precision

float16

Tokenizer

🌐 Identity

Source

HuggingFace

License

Open Access

💾

Est. VRAM Benchmark

~17.7GB

Analyze Hardware

Test Hardware Compatibility

* Technical estimation for FP16/Q4 weights. Does not include OS overhead or long-context batching. For Technical Reference Only.

📈 Interest Trend

* Real-time activity index across HuggingFace, GitHub and Research citations.

🔍 Semantic Keywords

🏷️ transformers 🏷️ safetensors 🏷️ ernie4_5_moe 🏷️ text-generation 🏷️ ernie4.5 🏷️ conversational 🏷️ en 🏷️ zh 🏷️ license:apache-2.0 🏷️ endpoints_compatible 🏷️ region:us

No similar models found.

Social Proof

FNI RankTop 61%

HuggingFace Hub

769Likes

14.5KDownloads

Hub Discussions

⚙️ Technical Specifications

4 specs

🧠

Parameters

21.83B

📏

Context

🏗️

Architecture

Ernie4_5_MoeForCausalLM

📚

Library

transformers

🚀 Deployment Info

Difficulty

💎Expert

Recommended Hardware

☁️ Multi-GPU or cloud A100/H100

Quick Info

Library: transformers
Size: 43.7 GB

Model Information Summary
Identity	ernie-4.5-21b-a3b-thinking
Author	baidu
Primary Category	Standard
Downloads	14,533
Likes	769
Source	Unknown
Technical Specifications
Architecture	Ernie4_5_MoeForCausalLM

🔬Technical Deep Dive

Full Specifications [+]

---

🚀 What's Next?

📊

Find Training Datasets

Discover datasets compatible with this model

📈

Compare Benchmarks

See how this model ranks on standard tests

⚡

🖼️ Visual Gallery

1 Images Detected

benchmark.png

⚡ Quick Commands

🦙 Ollama Run

ollama run ernie-4.5-21b-a3b-thinking

🤗 HF Download

huggingface-cli download baidu/ernie-4.5-21b-a3b-thinking

📦 Install Lib

pip install -U transformers

🖥️

Hardware Compatibility

Multi-Tier Validation Matrix

Live Sync

🎮 Compatible

RTX 3060 / 4060 Ti

Entry 8GB VRAM

🎮 Compatible

RTX 4070 Super

Mid 12GB VRAM

💻 Compatible

RTX 4080 / Mac M3

High 16GB VRAM

🚀 Compatible

RTX 3090 / 4090

Pro 24GB VRAM

🏗️ Compatible

RTX 6000 Ada

Workstation 48GB VRAM

🏭 Compatible

A100 / H100

Datacenter 80GB VRAM

ℹ️

Pro Tip: Compatibility is estimated for 4-bit quantization (Q4). High-precision (FP16) or ultra-long context windows will significantly increase VRAM requirements.

README

ERNIE-4.5-21B-A3B-Thinking

Model Highlights

Over the past three months, we have continued to scale the thinking capability of ERNIE-4.5-21B-A3B, improving both the quality and depth of reasoning, thereby advancing the competitiveness of ERNIE lightweight models in complex reasoning tasks. We are pleased to introduce ERNIE-4.5-21B-A3B-Thinking, featuring the following key enhancements:

Significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, text generation, and academic benchmarks that typically require human expertise.
Efficient tool usage capabilities.
Enhanced 128K long-context understanding capabilities.

[!NOTE] Note: This version has an increased thinking length. We strongly recommend its use in highly complex reasoning tasks.

benchmark

Model Overview

ERNIE-4.5-21B-A3B-Thinking is a text MoE post-trained model, with 21B total parameters and 3B activated parameters for each token. The following are the model configuration details:

Key	Value
Modality	Text
Training Stage	Posttraining
Params(Total / Activated)	21B / 3B
Layers	28
Heads(Q/KV)	20 / 4
Text Experts(Total / Activated)	64 / 6
Shared Experts	2
Context Length	131072

Quickstart

[!NOTE] To align with the wider community, this model releases Transformer-style weights. Both PyTorch and PaddlePaddle ecosystem tools, such as vLLM, transformers, and FastDeploy, are expected to be able to load and run this model.

FastDeploy Inference

Quickly deploy services using FastDeploy as shown below. For more detailed usage, refer to the FastDeploy GitHub Repository.

Note: 80GB x 1 GPU resources are required. Deploying this model requires FastDeploy version 2.2.

python -m fastdeploy.entrypoints.openai.api_server \
       --model baidu/ERNIE-4.5-21B-A3B-Thinking \
       --port 8180 \
       --metrics-port 8181 \
       --engine-worker-queue-port 8182 \
       --load-choices "default_v1" \
       --tensor-parallel-size 1 \
       --max-model-len 131072 \
       --reasoning-parser ernie_x1 \
       --tool-call-parser ernie_x1 \
       --max-num-seqs 32

The ERNIE-4.5-21B-A3B-Thinking model supports function call.

curl -X POST "http://0.0.0.0:8180/v1/chat/completions" \
-H "Content-Type: application/json" \
-d $'{
  "messages": [
    {
      "role": "user",
      "content": "How \'s the weather in Beijing today?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Determine weather in my location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": [
                "c",
                "f"
              ]
            }
          },
          "additionalProperties": false,
          "required": [
            "location",
            "unit"
          ]
        },
        "strict": true
      }
    }]
}'

vLLM inference

VLLM>=0.10.2 (excluding 0.11.0)

vllm serve baidu/ERNIE-4.5-21B-A3B-Thinking

The reasoning-parser and tool-call-parser for vLLM Ernie need install vllm main branch

Using `transformers` library

Note: You'll need thetransformerslibrary (version 4.54.0 or newer) installed to use this model.

The following contains a code snippet illustrating how to use the model generate content based on given inputs.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "baidu/ERNIE-4.5-21B-A3B-Thinking"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# decode the generated ids
generate_text = tokenizer.decode(output_ids, skip_special_tokens=True)
print("generate_text:", generate_text)

License

Citation

If you find ERNIE 4.5 useful or wish to use it in your projects, please kindly cite our technical report:

@misc{ernie2025technicalreport,
      title={ERNIE 4.5 Technical Report},
      author={Baidu-ERNIE-Team},
      year={2025},
      primaryClass={cs.CL},
      howpublished={\url{https://ernie.baidu.com/blog/publication/ERNIE_Technical_Report.pdf}}
}

7,077 chars • Full Disclosure Protocol Active

ZEN MODE • README

ERNIE-4.5-21B-A3B-Thinking

Model Highlights

Significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, text generation, and academic benchmarks that typically require human expertise.
Efficient tool usage capabilities.
Enhanced 128K long-context understanding capabilities.

[!NOTE] Note: This version has an increased thinking length. We strongly recommend its use in highly complex reasoning tasks.

benchmark

Model Overview

ERNIE-4.5-21B-A3B-Thinking is a text MoE post-trained model, with 21B total parameters and 3B activated parameters for each token. The following are the model configuration details:

Key	Value
Modality	Text
Training Stage	Posttraining
Params(Total / Activated)	21B / 3B
Layers	28
Heads(Q/KV)	20 / 4
Text Experts(Total / Activated)	64 / 6
Shared Experts	2
Context Length	131072

Quickstart

[!NOTE] To align with the wider community, this model releases Transformer-style weights. Both PyTorch and PaddlePaddle ecosystem tools, such as vLLM, transformers, and FastDeploy, are expected to be able to load and run this model.

FastDeploy Inference

Quickly deploy services using FastDeploy as shown below. For more detailed usage, refer to the FastDeploy GitHub Repository.

Note: 80GB x 1 GPU resources are required. Deploying this model requires FastDeploy version 2.2.

python -m fastdeploy.entrypoints.openai.api_server \
       --model baidu/ERNIE-4.5-21B-A3B-Thinking \
       --port 8180 \
       --metrics-port 8181 \
       --engine-worker-queue-port 8182 \
       --load-choices "default_v1" \
       --tensor-parallel-size 1 \
       --max-model-len 131072 \
       --reasoning-parser ernie_x1 \
       --tool-call-parser ernie_x1 \
       --max-num-seqs 32

The ERNIE-4.5-21B-A3B-Thinking model supports function call.

curl -X POST "http://0.0.0.0:8180/v1/chat/completions" \
-H "Content-Type: application/json" \
-d $'{
  "messages": [
    {
      "role": "user",
      "content": "How \'s the weather in Beijing today?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Determine weather in my location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": [
                "c",
                "f"
              ]
            }
          },
          "additionalProperties": false,
          "required": [
            "location",
            "unit"
          ]
        },
        "strict": true
      }
    }]
}'

vLLM inference

VLLM>=0.10.2 (excluding 0.11.0)

vllm serve baidu/ERNIE-4.5-21B-A3B-Thinking

The reasoning-parser and tool-call-parser for vLLM Ernie need install vllm main branch

Using `transformers` library

Note: You'll need thetransformerslibrary (version 4.54.0 or newer) installed to use this model.

The following contains a code snippet illustrating how to use the model generate content based on given inputs.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "baidu/ERNIE-4.5-21B-A3B-Thinking"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# decode the generated ids
generate_text = tokenizer.decode(output_ids, skip_special_tokens=True)
print("generate_text:", generate_text)

License

Citation

If you find ERNIE 4.5 useful or wish to use it in your projects, please kindly cite our technical report:

@misc{ernie2025technicalreport,
      title={ERNIE 4.5 Technical Report},
      author={Baidu-ERNIE-Team},
      year={2025},
      primaryClass={cs.CL},
      howpublished={\url{https://ernie.baidu.com/blog/publication/ERNIE_Technical_Report.pdf}}
}

📝 Limitations & Considerations

• Benchmark scores may vary based on evaluation methodology and hardware configuration.
• VRAM requirements are estimates; actual usage depends on quantization and batch size.
• FNI scores are relative rankings and may change as new models are added.
⚠ License Unknown: Verify licensing terms before commercial use.
• Source: Unknown

📜

Cite this model

Academic & Research Attribution

BibTeX

@misc{hf_model__baidu__ernie_4.5_21b_a3b_thinking,
  author = {baidu},
  title = {undefined Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/baidu/ernie-4.5-21b-a3b-thinking}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

baidu. (2026). undefined [Model]. Free2AITools. https://huggingface.co/baidu/ernie-4.5-21b-a3b-thinking

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Model Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id: hf-model--baidu--ernie-4.5-21b-a3b-thinking
author: baidu
tags: transformerssafetensorsernie4_5_moetext-generationernie4.5conversationalenzhlicense:apache-2.0endpoints_compatibleregion:us

⚙️ Technical Specs

architecture: Ernie4_5_MoeForCausalLM
params billions: 21.83
context length: 4,096
vram gb: 17.7
vram is estimated: true
vram formula: VRAM ≈ (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

likes: 769
downloads: 14,533

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!

⚡ Quick Commands

Engineering Specs

⚡ Hardware

🧠 Lifecycle

🌐 Identity

📈 Interest Trend

🔍 Semantic Keywords

Social Proof

🔬Technical Deep Dive

🚀 What's Next?

Find Training Datasets

Compare Benchmarks

Deployment Guide

🖼️ Visual Gallery

⚡ Quick Commands

Hardware Compatibility

RTX 3060 / 4060 Ti

RTX 4070 Super

RTX 4080 / Mac M3

RTX 3090 / 4090

RTX 6000 Ada

A100 / H100

README

ERNIE-4.5-21B-A3B-Thinking

Model Highlights

Model Overview

Quickstart

FastDeploy Inference

vLLM inference

Using `transformers` library

License

Citation

ERNIE-4.5-21B-A3B-Thinking

Model Highlights

Model Overview

Quickstart

FastDeploy Inference

vLLM inference

Using `transformers` library

License

Citation

📝 Limitations & Considerations

Cite this model

🛡️ Model Transparency Report

🆔 Identity & Source

⚙️ Technical Specs

📊 Engagement & Metrics