🧠

Model

Kokoro 82m

Name: Kokoro 82m
Author: hexgrad

by hexgrad ID: hf-model--hexgrad--kokoro-82m

Scale 0.082B

FNI Rank 41

Percentile Top 1%

Activity

→ 0.0%

**Kokoro** is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environment...

View Source Code →

Audited 41 FNI Score

Tiny 0.082B Params

- Context

Hot 4.3M Downloads

8G GPU ~2GB Est. VRAM

Model Information Summary
Entity Passport
Registry ID	hf-model--hexgrad--kokoro-82m
Provider	huggingface

💾

Compute Threshold

~1.4GB VRAM

Interactive

Analyze Hardware

Hardware Compatibility Test

▼

* Estimated for 4-Bit Quantization. Actual usage varies by context length and parallel batching.

⚡

🔗 Core Ecosystem

Ecosystem Node

Ecosystem Node

🕸️

Intelligence Hive

Multi-source Relation Matrix

Live Index

📈

Momentum Index

🏷️

Contextual Anchors

text-to-speech en arxiv:2306.07691 arxiv:2203.02395 base_model:yl4579/styletts2-ljspeech base_model:finetune:yl4579/styletts2-ljspeech doi:10.57967/hf/4329 license:apache-2.0 region:us

📜

Cite this model

Academic & Research Attribution

BibTeX

@misc{hf_model__hexgrad__kokoro_82m,
  author = {hexgrad},
  title = {Kokoro 82m Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/hexgrad/Kokoro-82M}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

hexgrad. (2026). Kokoro 82m [Model]. Free2AITools. https://huggingface.co/hexgrad/Kokoro-82M

🔬Technical Deep Dive

Full Specifications [+]

⚡ Quick Commands

🦙 Ollama Run

ollama run kokoro-82m

🤗 HF Download

huggingface-cli download hexgrad/kokoro-82m

⚖️ Free2AI Nexus Index

Methodology → 📘 What is FNI?

41.0

Top 1% Overall Impact

🔥 Popularity (P) 0

🚀 Velocity (V) 0

🛡️ Credibility (C) 0

🔧 Utility (U) 0

Nexus Verified Data

💬 Why this score?

This Kokoro 82m has a P score of 0 (popularity from downloads/likes), V of 0 (growth velocity), C of 0 (credibility from citations), and U of 0 (utility/deploy support).

🔗 Source Links (Click to verify)

📊 P: HuggingFace Stats 📈 V: 7-Day Delta 📄 C: Papers With Code 🔧 U: Deploy Score

Data Verified 🕐 Last Updated: Not calculated

Free2AI Nexus Index | Fair · Transparent · Explainable | Full Methodology

---

🚀 What's Next?

📊

Find Training Datasets

Discover datasets compatible with this model

📈

Compare Benchmarks

See how this model ranks on standard tests

⚡

Deployment Guide

Understand deployment options

README

license: apache-2.0
language:

en
base_model:
yl4579/StyleTTS2-LJSpeech
pipeline_tag: text-to-speech

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

🐈 GitHub: https://github.com/hexgrad/kokoro

🚀 Demo: https://hf.co/spaces/hexgrad/Kokoro-TTS

[!NOTE]
As of April 2025, the market rate of Kokoro served over API is under $1 per million characters of text input, or under $0.06 per hour of audio output. (On average, 1000 characters of input is about 1 minute of output.) Sources: ArtificialAnalysis/Replicate at 65 cents per M chars and DeepInfra at 80 cents per M chars.

This is an Apache-licensed model, and Kokoro has been deployed in numerous projects and commercial APIs. We welcome the deployment of the model in real use cases.

[!CAUTION]
Fake websites like kokorottsai_com (snapshot: https://archive.ph/nRRnk) and kokorotts_net (snapshot: https://archive.ph/60opa) are likely scams masquerading under the banner of a popular model.

Any website containing "kokoro" in its root domain (e.g. kokorottsai_com, kokorotts_net) is NOT owned by and NOT affiliated with this model page or its author, and attempts to imply otherwise are red flags.

Releases
Usage
EVAL.md ↗️
SAMPLES.md ↗️
VOICES.md ↗️
Model Facts
Training Details
Creative Commons Attribution
Acknowledgements

Releases

Model	Published	Training Data	Langs & Voices	SHA256
v1.0	2025 Jan 27	Few hundred hrs	8 & 54	`496dba11`
v0.19	2024 Dec 25	<100 hrs	1 & 10	`3b0c392f`

Training Costs	v0.19	v1.0	Total
in A100 80GB GPU hours	500	500	1000
average hourly rate	$0.80/h	$1.20/h	$1/h
in USD	$400	$600	$1000

Usage

You can run this basic cell on Google Colab. Listen to samples. For more languages and details, see Advanced Usage.

!pip install -q kokoro>=0.9.2 soundfile
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
import torch
pipeline = KPipeline(lang_code='a')
text = '''
[Kokoro](/kˈOkəɹO/) is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, [Kokoro](/kˈOkəɹO/) can be deployed anywhere from production environments to personal projects.
'''
generator = pipeline(text, voice='af_heart')
for i, (gs, ps, audio) in enumerate(generator):
    print(i, gs, ps)
    display(Audio(data=audio, rate=24000, autoplay=i==0))
    sf.write(f'{i}.wav', audio, 24000)

Under the hood, kokoro uses misaki, a G2P library at https://github.com/hexgrad/misaki

Model Facts

Architecture:

StyleTTS 2: https://arxiv.org/abs/2306.07691
ISTFTNet: https://arxiv.org/abs/2203.02395
Decoder only: no diffusion, no encoder release

Architected by: Li et al @ https://github.com/yl4579/StyleTTS2

Trained by: @rzvzn on Discord

Languages: Multiple

Model SHA256 Hash: 496dba118d1a58f5f3db2efc88dbdc216e0483fc89fe6e47ee1f2c53f18ad1e4

Training Details

Data: Kokoro was trained exclusively on permissive/non-copyrighted audio data and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:

Public domain audio
Audio licensed under Apache, MIT, etc
Synthetic audio^[1] generated by closed^[2] TTS models from large providers

[1] https://copyright.gov/ai/ai_policy_guidance.pdf

[2] No synthetic audio from open TTS models or "custom voice clones"

Total Dataset Size: A few hundred hours of audio

Total Training Cost: About $1000 for 1000 hours of A100 80GB vRAM

Creative Commons Attribution

The following CC BY audio was part of the dataset used to train Kokoro v1.0.

Audio Data	Duration Used	License	Added to Training Set After
Koniwa `tnc`	<1h	CC BY 3.0	v0.19 / 22 Nov 2024
SIWIS	<11h	CC BY 4.0	v0.19 / 22 Nov 2024

Acknowledgements

🛠️ @yl4579 for architecting StyleTTS 2.
🏆 @Pendrokar for adding Kokoro as a contender in the TTS Spaces Arena.
📊 Thank you to everyone who contributed synthetic training data.
❤️ Special thanks to all compute sponsors.
👾 Discord server: https://discord.gg/QuGxSWBfQy
🪽 Kokoro is a Japanese word that translates to "heart" or "spirit". It is also the name of an AI in the Terminator franchise.

8,354 chars • Full Disclosure Protocol Active

ZEN MODE • README

license: apache-2.0
language:

en
base_model:
yl4579/StyleTTS2-LJSpeech
pipeline_tag: text-to-speech

🐈 GitHub: https://github.com/hexgrad/kokoro

🚀 Demo: https://hf.co/spaces/hexgrad/Kokoro-TTS

[!NOTE]
As of April 2025, the market rate of Kokoro served over API is under $1 per million characters of text input, or under $0.06 per hour of audio output. (On average, 1000 characters of input is about 1 minute of output.) Sources: ArtificialAnalysis/Replicate at 65 cents per M chars and DeepInfra at 80 cents per M chars.

This is an Apache-licensed model, and Kokoro has been deployed in numerous projects and commercial APIs. We welcome the deployment of the model in real use cases.

[!CAUTION]
Fake websites like kokorottsai_com (snapshot: https://archive.ph/nRRnk) and kokorotts_net (snapshot: https://archive.ph/60opa) are likely scams masquerading under the banner of a popular model.

Any website containing "kokoro" in its root domain (e.g. kokorottsai_com, kokorotts_net) is NOT owned by and NOT affiliated with this model page or its author, and attempts to imply otherwise are red flags.

Releases
Usage
EVAL.md ↗️
SAMPLES.md ↗️
VOICES.md ↗️
Model Facts
Training Details
Creative Commons Attribution
Acknowledgements

Releases

Model	Published	Training Data	Langs & Voices	SHA256
v1.0	2025 Jan 27	Few hundred hrs	8 & 54	`496dba11`
v0.19	2024 Dec 25	<100 hrs	1 & 10	`3b0c392f`

Training Costs	v0.19	v1.0	Total
in A100 80GB GPU hours	500	500	1000
average hourly rate	$0.80/h	$1.20/h	$1/h
in USD	$400	$600	$1000

Usage

You can run this basic cell on Google Colab. Listen to samples. For more languages and details, see Advanced Usage.

!pip install -q kokoro>=0.9.2 soundfile
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
import torch
pipeline = KPipeline(lang_code='a')
text = '''
[Kokoro](/kˈOkəɹO/) is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, [Kokoro](/kˈOkəɹO/) can be deployed anywhere from production environments to personal projects.
'''
generator = pipeline(text, voice='af_heart')
for i, (gs, ps, audio) in enumerate(generator):
    print(i, gs, ps)
    display(Audio(data=audio, rate=24000, autoplay=i==0))
    sf.write(f'{i}.wav', audio, 24000)

Under the hood, kokoro uses misaki, a G2P library at https://github.com/hexgrad/misaki

Model Facts

Architecture:

StyleTTS 2: https://arxiv.org/abs/2306.07691
ISTFTNet: https://arxiv.org/abs/2203.02395
Decoder only: no diffusion, no encoder release

Architected by: Li et al @ https://github.com/yl4579/StyleTTS2

Trained by: @rzvzn on Discord

Languages: Multiple

Model SHA256 Hash: 496dba118d1a58f5f3db2efc88dbdc216e0483fc89fe6e47ee1f2c53f18ad1e4

Training Details

Data: Kokoro was trained exclusively on permissive/non-copyrighted audio data and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:

Public domain audio
Audio licensed under Apache, MIT, etc
Synthetic audio^[1] generated by closed^[2] TTS models from large providers

[1] https://copyright.gov/ai/ai_policy_guidance.pdf

[2] No synthetic audio from open TTS models or "custom voice clones"

Total Dataset Size: A few hundred hours of audio

Total Training Cost: About $1000 for 1000 hours of A100 80GB vRAM

Creative Commons Attribution

The following CC BY audio was part of the dataset used to train Kokoro v1.0.

Audio Data	Duration Used	License	Added to Training Set After
Koniwa `tnc`	<1h	CC BY 3.0	v0.19 / 22 Nov 2024
SIWIS	<11h	CC BY 4.0	v0.19 / 22 Nov 2024

Acknowledgements

🛠️ @yl4579 for architecting StyleTTS 2.
🏆 @Pendrokar for adding Kokoro as a contender in the TTS Spaces Arena.
📊 Thank you to everyone who contributed synthetic training data.
❤️ Special thanks to all compute sponsors.
👾 Discord server: https://discord.gg/QuGxSWBfQy
🪽 Kokoro is a Japanese word that translates to "heart" or "spirit". It is also the name of an AI in the Terminator franchise.

📝 Limitations & Considerations

• Benchmark scores may vary based on evaluation methodology and hardware configuration.
• VRAM requirements are estimates; actual usage depends on quantization and batch size.
• FNI scores are relative rankings and may change as new models are added.
⚠ License Unknown: Verify licensing terms before commercial use.
• Source: Unknown

Top Tier

Social Proof

HuggingFace Hub

5.4KLikes

4.3MDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Model Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id: hf-model--hexgrad--kokoro-82m
source: huggingface
author: hexgrad
tags: text-to-speechenarxiv:2306.07691arxiv:2203.02395base_model:yl4579/styletts2-ljspeechbase_model:finetune:yl4579/styletts2-ljspeechdoi:10.57967/hf/4329license:apache-2.0region:us

⚙️ Technical Specs

architecture: null
params billions: 0.082
context length: null
pipeline tag: null
vram gb: 1.4
vram is estimated: true
vram formula: VRAM ≈ (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

likes: 5,378
downloads: 4,344,156

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!

🕸️ Neural Mesh Hub

🔗 Core Ecosystem

Intelligence Hive

Momentum Index

Contextual Anchors

Cite this model

🔬Technical Deep Dive

⚡ Quick Commands

⚖️ Free2AI Nexus Index

💬 Why this score?

🔗 Source Links (Click to verify)

🚀 What's Next?

Find Training Datasets

Compare Benchmarks

Deployment Guide

README

Releases

Usage

Model Facts

Training Details

Creative Commons Attribution

Acknowledgements

Releases

Usage

Model Facts

Training Details

Creative Commons Attribution

Acknowledgements

📝 Limitations & Considerations

Social Proof

🛡️ Model Transparency Report

🆔 Identity & Source

⚙️ Technical Specs

📊 Engagement & Metrics