🧠
Model

Cosyvoice3 0.5b Mlx 4bit

by aufklarer hf-model--aufklarer--cosyvoice3-0.5b-mlx-4bit
Nexus Index
39.7 Top 100%
S: Semantic 50
A: Authority 0
P: Popularity 17
R: Recency 97
Q: Quality 50
Tech Context
0.5B Params
4.096K Ctx
Vital Performance
391 DL / 30D
0.0%
Audited 39.7 FNI Score
Tiny 0.5B Params
4k Context
391 Downloads
8G GPU ~2GB Est. VRAM
Commercial APACHE License
Model Information Summary
Entity Passport
Registry ID hf-model--aufklarer--cosyvoice3-0.5b-mlx-4bit
License Apache-2.0
Provider huggingface
💾

Compute Threshold

~1.7GB VRAM

Interactive
Analyze Hardware
â–ŧ

* Static estimation for 4-Bit Quantization.

📜

Cite this model

Academic & Research Attribution

BibTeX
@misc{hf_model__aufklarer__cosyvoice3_0.5b_mlx_4bit,
  author = {aufklarer},
  title = {Cosyvoice3 0.5b Mlx 4bit Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/aufklarer/cosyvoice3-0.5b-mlx-4bit}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
aufklarer. (2026). Cosyvoice3 0.5b Mlx 4bit [Model]. Free2AITools. https://huggingface.co/aufklarer/cosyvoice3-0.5b-mlx-4bit

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

Quick Commands

đŸĻ™ Ollama Run
ollama run cosyvoice3-0.5b-mlx-4bit
🤗 HF Download
huggingface-cli download aufklarer/cosyvoice3-0.5b-mlx-4bit

âš–ī¸ Nexus Index V2.0

39.7
TOP 100% SYSTEM IMPACT
Semantic (S) 50
Authority (A) 0
Popularity (P) 17
Recency (R) 97
Quality (Q) 50

đŸ’Ŧ Index Insight

FNI V2.0 for Cosyvoice3 0.5b Mlx 4bit: Semantic (S:50), Authority (A:0), Popularity (P:17), Recency (R:97), Quality (Q:50).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
---

🚀 What's Next?

Technical Deep Dive

CosyVoice3-0.5B MLX 4-bit

CosyVoice 3 text-to-speech model converted to MLX safetensors format with 4-bit quantization for Apple Silicon inference.

Converted from FunAudioLLM/Fun-CosyVoice3-0.5B-2512.

Swift inference: soniqo/speech-swift

Model Details

Component Architecture Size
LLM Qwen2.5-0.5B (24L, 896d, 14Q/2KV heads) 467 MB (4-bit)
DiT Flow Matching 22-layer DiT (1024d, 16 heads, 10 ODE steps) 634 MB (fp16)
HiFi-GAN Vocoder NSF + F0 predictor + ISTFT 79 MB (fp16)
Total ~1.2 GB

Pipeline

text
Text → LLM (Qwen2.5-0.5B) → Speech Tokens (FSQ 6561) → DiT Flow Matching → Mel (80-band) → HiFi-GAN → Audio (24kHz)

Languages

Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian

Files

  • llm.safetensors — LLM weights (4-bit quantized)
  • flow.safetensors — DiT flow matching decoder (fp16)
  • hifigan.safetensors — HiFi-GAN vocoder (fp16, weight-norm folded)
  • config.json — Model configuration

Conversion Details

  • LLM: 4-bit quantization (group_size=64) of attention projections, MLP, and speech head
  • Flow: fp16 (flow matching is sensitive to quantization)
  • HiFi-GAN: fp16 with weight normalization folded (w = g * v / ||v||)
  • Conv1d weights transposed from PyTorch [out, in, kernel] to MLX [out, kernel, in]

Usage

For use with soniqo/speech-swift:

swift
import CosyVoiceTTS

let model = try await CosyVoiceTTSModel.fromPretrained()
let audio = model.synthesize(text: "Hello, how are you?", language: "english")

CLI

bash
swift run cosyvoice-tts-cli --text "Hello, how are you?" --lang english --output hello.wav

License

Apache 2.0 (same as upstream CosyVoice 3)

Citation

bibtex
@article{du2025cosyvoice3,
  title={CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training},
  author={Du, Zhihao and others},
  journal={arXiv preprint arXiv:2505.17589},
  year={2025}
}

âš ī¸ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

  • â€ĸ Benchmark scores may vary based on evaluation methodology and hardware configuration.
  • â€ĸ VRAM requirements are estimates; actual usage depends on quantization and batch size.
  • â€ĸ FNI scores are relative rankings and may change as new models are added.
  • ⚠ License Unknown: Verify licensing terms before commercial use.

Social Proof

HuggingFace Hub
391Downloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id
hf-model--aufklarer--cosyvoice3-0.5b-mlx-4bit
slug
aufklarer--cosyvoice3-0.5b-mlx-4bit
source
huggingface
author
aufklarer
license
Apache-2.0
tags
mlx, cosyvoice3, tts, text-to-speech, speech-synthesis, apple-silicon, cosyvoice, zh, en, ja, ko, de, es, fr, it, ru, arxiv:2505.17589, base_model:funaudiollm/fun-cosyvoice3-0.5b-2512, license:apache-2.0, region:us, voice-cloning, zero-shot

âš™ī¸ Technical Specs

architecture
null
params billions
0.5
context length
4,096
pipeline tag
text-to-speech
vram gb
1.7
vram is estimated
true
vram formula
VRAM ≈ (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

downloads
391
stars
0
forks
0

Data indexed from public sources. Updated daily.