Gigachat3 702b A36b Preview
| Entity Passport | |
| Registry ID | hf-model--ai-sage--gigachat3-702b-a36b-preview |
| License | MIT |
| Provider | huggingface |
Compute Threshold
~529GB VRAM
* Static estimation for 4-Bit Quantization. [Multi-GPU / Unified Memory Required]
Cite this model
Academic & Research Attribution
@misc{hf_model__ai_sage__gigachat3_702b_a36b_preview,
author = {Ai Sage},
title = {Gigachat3 702b A36b Preview Model},
year = {2026},
howpublished = {\url{https://huggingface.co/ai-sage/gigachat3-702b-a36b-preview}},
note = {Accessed via Free2AITools Knowledge Fortress}
} π¬Technical Deep Dive
Full Specifications [+]βΎ
Quick Commands
huggingface-cli download ai-sage/gigachat3-702b-a36b-preview pip install -U transformers βοΈ Nexus Index V2.0
π¬ Index Insight
FNI V2.0 for Gigachat3 702b A36b Preview: Semantic (S:50), Authority (A:0), Popularity (P:23), Recency (R:73), Quality (Q:50).
Verification Authority
π What's Next?
Technical Deep Dive
GigaChat 3 Ultra Preview
ΠΡΠ΅Π΄ΡΡΠ°Π²Π»ΡΠ΅ΠΌ GigaChat 3 Ultra Preview β ΡΠ»Π°Π³ΠΌΠ°Π½ΡΠΊΡΡ instruct-ΠΌΠΎΠ΄Π΅Π»Ρ ΡΠ΅ΠΌΠ΅ΠΉΡΡΠ²Π° GigaChat.
ΠΠΎΠ΄Π΅Π»Ρ ΠΎΡΠ½ΠΎΠ²Π°Π½Π° Π½Π° Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ΅ Mixture-of-Experts (MoE) Ρ 702B ΠΎΠ±ΡΠΈΡ
ΠΈ 36B Π°ΠΊΡΠΈΠ²Π½ΡΡ
ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠΎΠ².
ΠΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ° Π²ΠΊΠ»ΡΡΠ°Π΅Ρ Multi-head Latent Attention (MLA) ΠΈ Multi-Token Prediction (MTP), Π·Π° ΡΡΠ΅Ρ ΡΠ΅Π³ΠΎ ΠΌΠΎΠ΄Π΅Π»Ρ ΠΎΠΏΡΠΈΠΌΠΈΠ·ΠΈΡΠΎΠ²Π°Π½Π° Π΄Π»Ρ Π²ΡΡΠΎΠΊΠΎΠΉ ΠΏΡΠΎΠΏΡΡΠΊΠ½ΠΎΠΉ ΡΠΏΠΎΡΠΎΠ±Π½ΠΎΡΡΠΈ (throughput) ΠΏΡΠΈ ΠΈΠ½ΡΠ΅ΡΠ΅Π½ΡΠ΅.
ΠΠ°Π½Π½Π°Ρ Π²Π΅ΡΡΠΈΡ ΠΏΡΠ΅Π΄Π½Π°Π·Π½Π°ΡΠ΅Π½Π° Π΄Π»Ρ Π²ΡΡΠΎΠΊΠΎΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡΠ΅Π»ΡΠ½ΠΎΠ³ΠΎ ΠΈΠ½ΡΠ΅ΡΠ΅Π½ΡΠ° Π² fp8, ΠΌΠΎΠ΄Π΅Π»Ρ Π² bf16 β GigaChat3-702B-A36B-preview-bf16.
ΠΠΎΠ»ΡΡΠ΅ ΠΏΠΎΠ΄ΡΠΎΠ±Π½ΠΎΡΡΠ΅ΠΉ Π²Β Ρ Π°Π±Ρ ΡΡΠ°ΡΡΠ΅.
ΠΡΡ ΠΈΡΠ΅ΠΊΡΡΡΠ° ΠΌΠΎΠ΄Π΅Π»ΠΈ
GigaChat 3 Ultra Preview ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅Ρ ΠΊΠ°ΡΡΠΎΠΌΠ½ΡΡ MoE-Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΡ:
Multi-head Latent Attention (MLA)
ΠΠΌΠ΅ΡΡΠΎ ΡΡΠ°Π½Π΄Π°ΡΡΠ½ΠΎΠ³ΠΎ Multi-head Attention ΠΌΠΎΠ΄Π΅Π»Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅Ρ MLA. MLA ΠΎΠ±Π΅ΡΠΏΠ΅ΡΠΈΠ²Π°Π΅Ρ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΠΉ ΠΈΠ½ΡΠ΅ΡΠ΅Π½Ρ Π·Π° ΡΡΠ΅Ρ ΡΠΆΠ°ΡΠΈΡ Key-Value (KV) ΠΊΡΡΠ° Π² Π»Π°ΡΠ΅Π½ΡΠ½ΡΠΉ Π²Π΅ΠΊΡΠΎΡ, ΡΡΠΎ Π·Π½Π°ΡΠΈΡΠ΅Π»ΡΠ½ΠΎ ΡΠ½ΠΈΠΆΠ°Π΅Ρ ΡΡΠ΅Π±ΠΎΠ²Π°Π½ΠΈΡ ΠΊ ΠΏΠ°ΠΌΡΡΠΈ ΠΈ ΡΡΠΊΠΎΡΡΠ΅Ρ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΡ.
Multi-Token Prediction (MTP)
ΠΠΎΠ΄Π΅Π»Ρ ΠΎΠ±ΡΡΠ΅Π½Π° Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ Π·Π°Π΄Π°ΡΠΈ Multi-Token Prediction (MTP). ΠΡΠΎ ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΏΡΠ΅Π΄ΡΠΊΠ°Π·ΡΠ²Π°ΡΡ Π½Π΅ΡΠΊΠΎΠ»ΡΠΊΠΎ ΡΠΎΠΊΠ΅Π½ΠΎΠ² Π·Π° ΠΎΠ΄ΠΈΠ½ ΠΏΡΠΎΡ ΠΎΠ΄, ΡΡΠΎ ΡΡΠΊΠΎΡΡΠ΅Ρ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΡ Π΄ΠΎ 40% Ρ ΠΏΠΎΠΌΠΎΡΡΡ ΡΠ΅Ρ Π½ΠΈΠΊ ΡΠΏΠ΅ΠΊΡΠ»ΡΡΠΈΠ²Π½ΠΎΠΉ/ΠΏΠ°ΡΠ°Π»Π»Π΅Π»ΡΠ½ΠΎΠΉ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΠΈ.
ΠΠ°Π½Π½ΡΠ΅ Π΄Π»Ρ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ
ΠΡ Π΄ΠΎΠ±Π°Π²ΠΈΠ»ΠΈ Π² Π΄Π°ΡΠ°ΡΠ΅Ρ 10 ΡΠ·ΡΠΊΠΎΠ² β ΠΎΡ ΠΊΠΈΡΠ°ΠΉΡΠΊΠΎΠ³ΠΎ ΠΈ Π°ΡΠ°Π±ΡΠΊΠΎΠ³ΠΎ Π΄ΠΎ ΡΠ·Π±Π΅ΠΊΡΠΊΠΎΠ³ΠΎ ΠΈ ΠΊΠ°Π·Π°Ρ ΡΠΊΠΎΠ³ΠΎ, Π° ΡΠ°ΠΊΠΆΠ΅ ΡΠ°ΡΡΠΈΡΠΈΠ»ΠΈ Π½Π°Π±ΠΎΡ ΠΈΡΡΠΎΡΠ½ΠΈΠΊΠΎΠ²: ΠΊΠ½ΠΈΠ³ΠΈ, Π°ΠΊΠ°Π΄Π΅ΠΌΠΈΡΠ΅ΡΠΊΠΈΠ΅ Π΄Π°Π½Π½ΡΠ΅, Π΄Π°ΡΠ°ΡΠ΅ΡΡ ΠΏΠΎ ΠΊΠΎΠ΄Ρ ΠΈ ΠΌΠ°ΡΠ΅ΠΌΠ°ΡΠΈΠΊΠ΅. ΠΡΠ΅ Π΄Π°Π½Π½ΡΠ΅ ΠΏΡΠΎΡ ΠΎΠ΄ΡΡ Π΄Π΅Π΄ΡΠΏΠ»ΠΈΠΊΠ°ΡΠΈΡ, ΡΠ·ΡΠΊΠΎΠ²ΡΡ ΡΠΈΠ»ΡΡΡΠ°ΡΠΈΡ ΠΈ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΠΏΡΠΎΠ²Π΅ΡΠΊΠΈ ΠΊΠ°ΡΠ΅ΡΡΠ²Π° ΠΏΡΠΈ ΠΏΠΎΠΌΠΎΡΠΈ ΡΠ²ΡΠΈΡΡΠΈΠΊ ΠΈ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΎΡΠΎΠ². ΠΠ»ΡΡΠ΅Π²ΠΎΠΉ Π²ΠΊΠ»Π°Π΄ Π² ΠΊΠ°ΡΠ΅ΡΡΠ²ΠΎ Π²Π½Π΅ΡΠ»Π° ΡΠΈΠ½ΡΠ΅ΡΠΈΠΊΠ°: ΠΌΡ ΡΠ³Π΅Π½Π΅ΡΠΈΡΠΎΠ²Π°Π»ΠΈ ΠΎΠΊΠΎΠ»ΠΎ 5,5 ΡΡΠΈΠ»Π»ΠΈΠΎΠ½ΠΎΠ² ΡΠΎΠΊΠ΅Π½ΠΎΠ² ΡΠΈΠ½ΡΠ΅ΡΠΈΡΠ΅ΡΠΊΠΈΡ Π΄Π°Π½Π½ΡΡ . Π ΠΊΠΎΡΠΏΡΡ Π²Ρ ΠΎΠ΄ΡΡ Π²ΠΎΠΏΡΠΎΡΡ-ΠΎΡΠ²Π΅ΡΡ ΠΊ ΡΠ΅ΠΊΡΡΠ°ΠΌ, ΡΠ΅ΠΏΠΎΡΠΊΠΈ reverse-prompt Π΄Π»Ρ ΡΡΡΡΠΊΡΡΡΠΈΡΠΎΠ²Π°Π½ΠΈΡ Π΄Π°Π½Π½ΡΡ , LLM-Π·Π°ΠΌΠ΅ΡΠΊΠΈ Ρ ΠΊΠΎΠΌΠΌΠ΅Π½ΡΠ°ΡΠΈΡΠΌΠΈ ΠΎΡ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π²Π½ΡΡΡΠΈ ΡΠ΅ΠΊΡΡΠΎΠ², ΠΌΠΈΠ»Π»ΠΈΠΎΠ½Ρ ΡΠΈΠ½ΡΠ΅ΡΠΈΡΠ΅ΡΠΊΠΈΡ Π·Π°Π΄Π°Ρ Ρ ΡΠ΅ΡΠ΅Π½ΠΈΡΠΌΠΈ ΠΏΠΎ ΠΌΠ°ΡΠ΅ΠΌΠ°ΡΠΈΠΊΠ΅ ΠΈ ΠΎΠ»ΠΈΠΌΠΏΠΈΠ°Π΄Π½ΠΎΠΌΡ ΠΏΡΠΎΠ³ΡΠ°ΠΌΠΌΠΈΡΠΎΠ²Π°Π½ΠΈΡ (Ρ ΡΠΈΠ½ΡΠ΅ΡΠΈΡΠ΅ΡΠΊΠΈΠΌΠΈ ΡΠ΅ΡΡΠ°ΠΌΠΈ) Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ PromptCot.
ΠΠ΅Π½ΡΠΌΠ°ΡΠΊΠΈ
| Metric | GigaChat 3 Ultra | GigaChat 2 Max |
|---|---|---|
| MERA text | 0.683 | 0.663 |
| MERA industrial | 0.645 / 0.824 | β |
| MERA code | 0.338 | β |
| AUTOLOGI_EN_ZERO_SHOT | 0.6857 | 0.6489 |
| GPQA_COT_ZERO_SHOT | 0.5572 | 0.4714 |
| HUMAN_EVAL_PLUS_ZERO_SHOT | 0.8659 | 0.7805 |
| LBPP_PYTHON_ZERO_SHOT | 0.5247 | 0.4753 |
| MMLU_PRO_EN_FIVE_SHOT | 0.7276 | 0.6655 |
| GSM8K_FIVE_SHOT | 0.9598 | 0.9052 |
| MATH_500_FOUR_SHOT | 0.7840 | 0.7160 |
ΠΠ°ΠΊ ΠΏΡΠΎΠ²Π΅ΡΠΈΡΡ ΠΌΠ΅ΡΡΠΈΠΊΠΈ ΠΌΠΎΠ΄Π΅Π»ΠΈ
# lm-eval[api]==0.4.9.1
# sglang[all]==0.5.5
# ΠΈΠ»ΠΈ
# vllm==0.11.2
export HF_ALLOW_CODE_EVAL=1
# sglang server up
# 702B
python -m sglang.launch_server --model-path --host 127.0.0.1 --port 30000 --nnodes 2 --node-rank <0/1> --tp 16 --ep 16 --dtype auto --mem-fraction-static 0.7 --trust-remote-code --allow-auto-truncate --speculative-algorithm EAGLE --speculative-num-steps 1 --speculative-eagle-topk 1 --speculative-num-draft-tokens 2 --dist-init-addr :50000
# mmlu pro check
python -m lm_eval --model sglang-generate --output_path --batch_size 16 --model_args base_url=http://127.0.0.1:30000/generate,num_concurrent=16,tokenized_requests=True,max_length=131072,tokenizer= --trust_remote_code --confirm_run_unsafe_code --num_fewshot 5 --tasks mmlu_pro
ΠΠ½ΡΠ΅ΡΠ΅Π½Ρ ΠΈ Π΄Π΅ΠΏΠ»ΠΎΠΉ
GigaChat 3 Ultra Preview ΠΎΡΠΈΠ΅Π½ΡΠΈΡΠΎΠ²Π°Π½Π° Π½Π° ΠΊΠ»Π°ΡΡΠ΅ΡΠ½ΡΠ΅ ΠΈ on-prem-ΡΡΠ΅Π½Π°ΡΠΈΠΈ Ρ ΡΠ΅ΡΡΡΠ·Π½ΠΎΠΉ ΠΈΠ½ΡΡΠ°ΡΡΡΡΠΊΡΡΡΠΎΠΉ.
ΠΡΠ½ΠΎΠ²Π½ΡΠ΅ ΠΌΠΎΠΌΠ΅Π½ΡΡ:
- ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΠ° ΠΏΠΎΠΏΡΠ»ΡΡΠ½ΡΡ inference-Π΄Π²ΠΈΠΆΠΊΠΎΠ² (vLLM, SGLang, LMDeploy, TensorRT-LLM ΠΈ Π΄Ρ.);
- ΡΠ΅ΠΆΠΈΠΌΡ BF16 ΠΈ FP8 (Π΄Π»Ρ FP8 β ΠΎΡΠ΄Π΅Π»ΡΠ½Π°Ρ ΡΠ±ΠΎΡΠΊΠ° ΠΈ ΡΠ΅ΠΊΠΎΠΌΠ΅Π½Π΄Π°ΡΠΈΠΈ ΠΏΠΎ ΠΊΠΎΠ½ΡΠΈΠ³ΡΡΠ°ΡΠΈΠΈ GPU);
- ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ MLA ΠΈ MTP Π΄Π»Ρ ΡΠΌΠ΅Π½ΡΡΠ΅Π½ΠΈΡ KV-ΠΊΡΡΠ° ΠΈ ΡΡΠΊΠΎΡΠ΅Π½ΠΈΡ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΠΈ;
- ΠΏΡΠΎΠΊΡΠΈ- ΠΈ gateway-ΡΠ»ΠΎΠΉ Π΄Π»Ρ ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΠΈ Ρ Π²Π½Π΅ΡΠ½ΠΈΠΌΠΈ ΡΠ΅ΡΠ²ΠΈΡΠ°ΠΌΠΈ, ΠΈΠ½ΡΡΡΡΠΌΠ΅Π½ΡΠ°ΠΌΠΈ ΠΈ Π°Π³Π΅Π½ΡΠ½ΡΠΌΠΈ ΡΡΠ΅ΠΉΠΌΠ²ΠΎΡΠΊΠ°ΠΌΠΈ.
ΠΠ»Ρ ΠΊΠΎΠ½ΡΠΈΠ³ΡΡΠ°ΡΠΈΠΈ ΠΌΠΎΠΆΠ½ΠΎ ΠΎΡΠΈΠ΅Π½ΡΠΈΡΠΎΠ²Π°ΡΡΡΡ Π½Π° ΠΏΡΠ±Π»ΠΈΠΊΡΠ΅ΠΌΡΠ΅ Π³Π°ΠΉΠ΄Ρ Π΄Π»Ρ ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ ΡΡ ΠΎΠΆΠ΅Π³ΠΎ ΠΌΠ°ΡΡΡΠ°Π±Π°:
- DeepSeek-V3 β ΡΠ°Π·Π΄Π΅Π» How to run locally Π² ΠΎΡΠΈΡΠΈΠ°Π»ΡΠ½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΡΠ½ΠΎΠΉ ΠΊΠ°ΡΡΠΎΡΠΊΠ΅:
- Kimi-K2-Instruct β ΡΠ΅ΠΊΠΎΠΌΠ΅Π½Π΄Π°ΡΠΈΠΈ ΠΏΠΎ Π΄Π΅ΠΏΠ»ΠΎΡ (vLLM / SGLang / LMDeploy):
β οΈ Incomplete Data
Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.
View Original Source βπ Limitations & Considerations
- β’ Benchmark scores may vary based on evaluation methodology and hardware configuration.
- β’ VRAM requirements are estimates; actual usage depends on quantization and batch size.
- β’ FNI scores are relative rankings and may change as new models are added.
- β License Unknown: Verify licensing terms before commercial use.
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
π‘οΈ Model Transparency Report
Technical metadata sourced from upstream repositories.
π Identity & Source
- id
- hf-model--ai-sage--gigachat3-702b-a36b-preview
- slug
- ai-sage--gigachat3-702b-a36b-preview
- source
- huggingface
- author
- Ai Sage
- license
- MIT
- tags
- transformers, safetensors, deepseek_v3, text-generation, moe, conversational, ru, en, license:mit, text-generation-inference, endpoints_compatible, fp8, region:us
βοΈ Technical Specs
- architecture
- null
- params billions
- 702
- context length
- 4,096
- pipeline tag
- text-generation
- vram gb
- 529
- vram is estimated
- true
- vram formula
- VRAM β (params * 0.75) + 2GB (KV) + 0.5GB (OS)
π Engagement & Metrics
- downloads
- 683
- stars
- 0
- forks
- 0
Data indexed from public sources. Updated daily.