🧠

Model

Cosyvoice3 0.5b Mlx 4bit

Name: Cosyvoice3 0.5b Mlx 4bit
Author: aufklarer

by aufklarer hf-model--aufklarer--cosyvoice3-0.5b-mlx-4bit

Nexus Index

39.7 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 17

R: Recency 97

Q: Quality 50

Tech Context

0.5B Params

4.096K Ctx

Vital Performance

391 DL / 30D

0.0%

Source →

Audited 39.7 FNI Score

Tiny 0.5B Params

4k Context

391 Downloads

8G GPU ~2GB Est. VRAM

Commercial APACHE License

Model Information Summary
Entity Passport
Registry ID	hf-model--aufklarer--cosyvoice3-0.5b-mlx-4bit
License	Apache-2.0
Provider	huggingface

💾

Compute Threshold

~1.7GB VRAM

Interactive

Analyze Hardware

Hardware Compatibility Test

▼

* Static estimation for 4-Bit Quantization.

📜

Cite this model

Academic & Research Attribution

BibTeX

@misc{hf_model__aufklarer__cosyvoice3_0.5b_mlx_4bit,
  author = {aufklarer},
  title = {Cosyvoice3 0.5b Mlx 4bit Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/aufklarer/cosyvoice3-0.5b-mlx-4bit}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

aufklarer. (2026). Cosyvoice3 0.5b Mlx 4bit [Model]. Free2AITools. https://huggingface.co/aufklarer/cosyvoice3-0.5b-mlx-4bit

🔬Technical Deep Dive

Full Specifications [+]

Quick Commands

🦙 Ollama Run

ollama run cosyvoice3-0.5b-mlx-4bit

🤗 HF Download

huggingface-cli download aufklarer/cosyvoice3-0.5b-mlx-4bit

⚖️ Nexus Index V2.0

Methodology Index Protocol

39.7

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 17

Recency (R) 97

Quality (Q) 50

💬 Index Insight

FNI V2.0 for Cosyvoice3 0.5b Mlx 4bit: Semantic (S:50), Authority (A:0), Popularity (P:17), Recency (R:97), Quality (Q:50).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

---

🚀 What's Next?

📊

Find Training Datasets

Discover datasets compatible with this model

📈

Compare Benchmarks

See how this model ranks on standard tests

⚡

Deployment Guide

Understand deployment options

Technical Deep Dive

CosyVoice3-0.5B MLX 4-bit

CosyVoice 3 text-to-speech model converted to MLX safetensors format with 4-bit quantization for Apple Silicon inference.

Converted from FunAudioLLM/Fun-CosyVoice3-0.5B-2512.

Swift inference: soniqo/speech-swift

Model Details

Component	Architecture	Size
LLM	Qwen2.5-0.5B (24L, 896d, 14Q/2KV heads)	467 MB (4-bit)
DiT Flow Matching	22-layer DiT (1024d, 16 heads, 10 ODE steps)	634 MB (fp16)
HiFi-GAN Vocoder	NSF + F0 predictor + ISTFT	79 MB (fp16)
Total		~1.2 GB

Pipeline

text

Text → LLM (Qwen2.5-0.5B) → Speech Tokens (FSQ 6561) → DiT Flow Matching → Mel (80-band) → HiFi-GAN → Audio (24kHz)

Languages

Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian

Files

llm.safetensors — LLM weights (4-bit quantized)
flow.safetensors — DiT flow matching decoder (fp16)
hifigan.safetensors — HiFi-GAN vocoder (fp16, weight-norm folded)
config.json — Model configuration

Conversion Details

LLM: 4-bit quantization (group_size=64) of attention projections, MLP, and speech head
Flow: fp16 (flow matching is sensitive to quantization)
HiFi-GAN: fp16 with weight normalization folded (w = g * v / ||v||)
Conv1d weights transposed from PyTorch [out, in, kernel] to MLX [out, kernel, in]

Usage

For use with soniqo/speech-swift:

swift

import CosyVoiceTTS

let model = try await CosyVoiceTTSModel.fromPretrained()
let audio = model.synthesize(text: "Hello, how are you?", language: "english")

CLI

bash

swift run cosyvoice-tts-cli --text "Hello, how are you?" --lang english --output hello.wav

License

Apache 2.0 (same as upstream CosyVoice 3)

Citation

bibtex

@article{du2025cosyvoice3,
  title={CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training},
  author={Du, Zhihao and others},
  journal={arXiv preprint arXiv:2505.17589},
  year={2025}
}

⚠️ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

• Benchmark scores may vary based on evaluation methodology and hardware configuration.
• VRAM requirements are estimates; actual usage depends on quantization and batch size.
• FNI scores are relative rankings and may change as new models are added.
⚠ License Unknown: Verify licensing terms before commercial use.

Social Proof

HuggingFace Hub

391Downloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-model--aufklarer--cosyvoice3-0.5b-mlx-4bit
slug: aufklarer--cosyvoice3-0.5b-mlx-4bit
source: huggingface
author: aufklarer
license: Apache-2.0
tags: mlx, cosyvoice3, tts, text-to-speech, speech-synthesis, apple-silicon, cosyvoice, zh, en, ja, ko, de, es, fr, it, ru, arxiv:2505.17589, base_model:funaudiollm/fun-cosyvoice3-0.5b-2512, license:apache-2.0, region:us, voice-cloning, zero-shot

⚙️ Technical Specs

architecture: null
params billions: 0.5
context length: 4,096
pipeline tag: text-to-speech
vram gb: 1.7
vram is estimated: true
vram formula: VRAM ≈ (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

downloads: 391
stars: 0
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!