VibeVoice-1.5B
by microsoft
--- language: - en - zh license: mit pipeline_tag: text-to-speech tags: - Podcast library_name: transformers --- VibeVoice is a novel framework designed for gen
π§ Architecture Explorer
Neural network architecture
About
VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking. A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic a...
π Limitations & Considerations
- β’ Benchmark scores may vary based on evaluation methodology and hardware configuration.
- β’ VRAM requirements are estimates; actual usage depends on quantization and batch size.
- β’ FNI scores are relative rankings and may change as new models are added.
- β’ Data source: [{"source_platform":"huggingface","source_url":"https://huggingface.co/microsoft/VibeVoice-1.5B","fetched_at":"2025-12-19T07:41:01.175Z","adapter_version":"3.2.0"}]
π Related Resources
π Related Papers
No related papers linked yet. Check the model's official documentation for research papers.
π Training Datasets
Training data information not available. Refer to the original model card for details.