🧠
Model

Glm 5.1 Mlx 2.9bit

by spicyneuron hf-model--spicyneuron--glm-5.1-mlx-2.9bit
Nexus Index
40.2 Top 100%
S: Semantic 50
A: Authority 0
P: Popularity 35
R: Recency 100
Q: Quality 65
Tech Context
2.9 Params
4.096K Ctx
Vital Performance
3.2K DL / 30D
0.0%
Audited 40.2 FNI Score
Tiny 2.9B Params
4k Context
3.2K Downloads
8G GPU ~4GB Est. VRAM
Commercial MIT License
Model Information Summary
Entity Passport
Registry ID hf-model--spicyneuron--glm-5.1-mlx-2.9bit
License MIT
Provider huggingface
💾

Compute Threshold

~3.5GB VRAM

Interactive
Analyze Hardware
â–ŧ

* Static estimation for 4-Bit Quantization.

📜

Cite this model

Academic & Research Attribution

BibTeX
@misc{hf_model__spicyneuron__glm_5.1_mlx_2.9bit,
  author = {spicyneuron},
  title = {Glm 5.1 Mlx 2.9bit Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/spicyneuron/glm-5.1-mlx-2.9bit}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
spicyneuron. (2026). Glm 5.1 Mlx 2.9bit [Model]. Free2AITools. https://huggingface.co/spicyneuron/glm-5.1-mlx-2.9bit

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

Quick Commands

đŸĻ™ Ollama Run
ollama run glm-5.1-mlx-2.9bit
🤗 HF Download
huggingface-cli download spicyneuron/glm-5.1-mlx-2.9bit

âš–ī¸ Nexus Index V2.0

40.2
TOP 100% SYSTEM IMPACT
Semantic (S) 50
Authority (A) 0
Popularity (P) 35
Recency (R) 100
Quality (Q) 65

đŸ’Ŧ Index Insight

FNI V2.0 for Glm 5.1 Mlx 2.9bit: Semantic (S:50), Authority (A:0), Popularity (P:35), Recency (R:100), Quality (Q:65).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
---

🚀 What's Next?

Technical Deep Dive

GLM 5.1 optimized for MLX. A mixed-precision quant that balances speed, memory, and accuracy.

Usage

sh
# Start server at http://localhost:8080/chat/completions
uvx --from mlx-lm mlx_lm.server \
  --host 127.0.0.1 \
  --port 8080 \
  --model spicyneuron/GLM-5.1-MLX-2.9bit

Methodology

Quantized with a mlx-lm fork, drawing inspiration from Unsloth/AesSedai/ubergarm style mixed-precision GGUFs. MLX quantization options differ than llama.cpp, but the principles are the same:

  • Sensitive layers like MoE routing, attention, and output embeddings get higher precision
  • More tolerant layers like MoE experts get lower precision

Benchmarks

metric baa-ai/GLM-5.1-RAM-270GB-MLX 2.9bit (this model)
bpw 3.1096 2.9064
peak memory (1024/512) 291.257 272.358
prompt tok/s (1024) 194.958 Âą 0.075 194.216 Âą 0.167
gen tok/s (512) 21.381 Âą 0.050 19.527 Âą 0.035
perplexity 4.780 Âą 0.020 4.118 Âą 0.016
hellaswag 0.546 Âą 0.011 0.59 Âą 0.011
piqa 0.776 Âą 0.01 0.794 Âą 0.009
winogrande 0.668 Âą 0.013 0.695 Âą 0.013

Tested on a Mac Studio M3 Ultra with:

text
mlx_lm.perplexity --sequence-length 2048 --seed 123
mlx_lm.benchmark --prompt-tokens 1024 --generation-tokens 512 --num-trials 5
mlx_lm.evaluate --tasks hellaswag --seed 123 --num-shots 0 --limit 2000
mlx_lm.evaluate --tasks piqa --seed 123 --num-shots 0 --limit 2000
mlx_lm.evaluate --tasks winogrande --seed 123 --num-shots 0 --limit 2000

âš ī¸ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

  • â€ĸ Benchmark scores may vary based on evaluation methodology and hardware configuration.
  • â€ĸ VRAM requirements are estimates; actual usage depends on quantization and batch size.
  • â€ĸ FNI scores are relative rankings and may change as new models are added.
  • ⚠ License Unknown: Verify licensing terms before commercial use.

Social Proof

HuggingFace Hub
3.2KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id
hf-model--spicyneuron--glm-5.1-mlx-2.9bit
slug
spicyneuron--glm-5.1-mlx-2.9bit
source
huggingface
author
spicyneuron
license
MIT
tags
mlx, safetensors, glm_moe_dsa, text-generation, conversational, en, zh, base_model:zai-org/glm-5.1, base_model:quantized:zai-org/glm-5.1, license:mit, 2-bit, region:us

âš™ī¸ Technical Specs

architecture
null
params billions
2.9
context length
4,096
pipeline tag
text-generation
vram gb
3.5
vram is estimated
true
vram formula
VRAM ≈ (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

downloads
3,169
stars
0
forks
0

Data indexed from public sources. Updated daily.