Supertonic 2
| Entity Passport | |
| Registry ID | hf-model--supertone--supertonic-2 |
| License | OpenRAIL |
| Provider | huggingface |
Cite this model
Academic & Research Attribution
@misc{hf_model__supertone__supertonic_2,
author = {Supertone},
title = {Supertonic 2 Model},
year = {2026},
howpublished = {\url{https://huggingface.co/supertone/supertonic-2}},
note = {Accessed via Free2AITools Knowledge Fortress}
} π¬Technical Deep Dive
Full Specifications [+]βΎ
Quick Commands
huggingface-cli download supertone/supertonic-2 βοΈ Nexus Index V2.0
π¬ Index Insight
FNI V2.0 for Supertonic 2: Semantic (S:50), Authority (A:0), Popularity (P:42), Recency (R:80), Quality (Q:50).
Verification Authority
π What's Next?
Technical Deep Dive
Supertonic 2 β Lightning Fast, On-Device TTS, Multilingual TTS

Supertonic is a lightning-fast, on-device text-to-speech system designed for extreme performance with minimal computational overhead. Powered by ONNX Runtime, it runs entirely on your deviceβno cloud, no API calls, no privacy concerns.
What's New in Supertonic 2
Supertonic 2 extends multilingual capabilities while maintaining the same inference speed and efficiency as the original.
π Multilingual Support
| Language | Code |
|---|---|
| English | en |
| Korean | ko |
| Spanish | es |
| Portuguese | pt |
| French | fr |
β‘ Same Speed, More Languages
- No speed degradation: Supertonic 2 delivers the same ultra-fast inference speed as the originalβup to 167Γ faster than real-time
- Efficient architecture: Only 66M parameters, optimized for on-device deployment
- Cross-language consistency: All supported languages share the same model architecture and inference pipeline
Performance
We evaluated Supertonic's performance (with 2 inference steps) using two key metrics across input texts of varying lengths: Short (59 chars), Mid (152 chars), and Long (266 chars).
Metrics:
- Characters per Second: Measures throughput by dividing the number of input characters by the time required to generate audio. Higher is better.
- Real-time Factor (RTF): Measures the time taken to synthesize audio relative to its duration. Lower is better (e.g., RTF of 0.1 means it takes 0.1 seconds to generate one second of audio).
Characters per Second
| System | Short (59 chars) | Mid (152 chars) | Long (266 chars) |
|---|---|---|---|
| Supertonic (M4 pro - CPU) | 912 | 1048 | 1263 |
| Supertonic (M4 pro - WebGPU) | 996 | 1801 | 2509 |
| Supertonic (RTX4090) | 2615 | 6548 | 12164 |
API ElevenLabs Flash v2.5 |
144 | 209 | 287 |
API OpenAI TTS-1 |
37 | 55 | 82 |
API Gemini 2.5 Flash TTS |
12 | 18 | 24 |
API Supertone Sona speech 1 |
38 | 64 | 92 |
Open Kokoro |
104 | 107 | 117 |
Open NeuTTS Air |
37 | 42 | 47 |
Notes:
API= Cloud-based API services (measured from Seoul)Open= Open-source models
Supertonic (M4 pro - CPU) and (M4 pro - WebGPU): Tested with ONNX
Supertonic (RTX4090): Tested with PyTorch model
Kokoro: Tested on M4 Pro CPU with ONNX
NeuTTS Air: Tested on M4 Pro CPU with Q8-GGUF
Real-time Factor
| System | Short (59 chars) | Mid (152 chars) | Long (266 chars) |
|---|---|---|---|
| Supertonic (M4 pro - CPU) | 0.015 | 0.013 | 0.012 |
| Supertonic (M4 pro - WebGPU) | 0.014 | 0.007 | 0.006 |
| Supertonic (RTX4090) | 0.005 | 0.002 | 0.001 |
API ElevenLabs Flash v2.5 |
0.133 | 0.077 | 0.057 |
API OpenAI TTS-1 |
0.471 | 0.302 | 0.201 |
API Gemini 2.5 Flash TTS |
1.060 | 0.673 | 0.541 |
API Supertone Sona speech 1 |
0.372 | 0.206 | 0.163 |
Open Kokoro |
0.144 | 0.124 | 0.126 |
Open NeuTTS Air |
0.390 | 0.338 | 0.343 |
Additional Performance Data (5-step inference)
Characters per Second (5-step)
| System | Short (59 chars) | Mid (152 chars) | Long (266 chars) |
|---|---|---|---|
| Supertonic (M4 pro - CPU) | 596 | 691 | 850 |
| Supertonic (M4 pro - WebGPU) | 570 | 1118 | 1546 |
| Supertonic (RTX4090) | 1286 | 3757 | 6242 |
Real-time Factor (5-step)
| System | Short (59 chars) | Mid (152 chars) | Long (266 chars) |
|---|---|---|---|
| Supertonic (M4 pro - CPU) | 0.023 | 0.019 | 0.018 |
| Supertonic (M4 pro - WebGPU) | 0.024 | 0.012 | 0.010 |
| Supertonic (RTX4090) | 0.011 | 0.004 | 0.002 |
License
This projectβs sample code is released under the MIT License. - see the LICENSE for details.
The accompanying model is released under the OpenRAIL-M License. - see the LICENSE file for details.
This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project. - see the LICENSE for details.
Copyright (c) 2026 Supertone Inc.
β οΈ Incomplete Data
Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.
View Original Source βπ Limitations & Considerations
- β’ Benchmark scores may vary based on evaluation methodology and hardware configuration.
- β’ VRAM requirements are estimates; actual usage depends on quantization and batch size.
- β’ FNI scores are relative rankings and may change as new models are added.
- β License Unknown: Verify licensing terms before commercial use.
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
π‘οΈ Model Transparency Report
Technical metadata sourced from upstream repositories.
π Identity & Source
- id
- hf-model--supertone--supertonic-2
- slug
- supertone--supertonic-2
- source
- huggingface
- author
- Supertone
- license
- OpenRAIL
- tags
- supertonic, onnx, text-to-speech, speech-synthesis, tts, en, ko, es, pt, fr, license:openrail, region:us
βοΈ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
- pipeline tag
- text-to-speech
π Engagement & Metrics
- downloads
- 7,522
- stars
- 0
- forks
- 0
Data indexed from public sources. Updated daily.