VoxEval
Pillar scores are computed during the next indexing cycle.
Github repository for paper: *VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models* Also check out our survey paper at *Recent Advances in Speech Language Models: A Survey*! **VoxEval** is a novel speech question-answering benchmark specifically designed to assess SLMs' knowledge understanding through purely speech-based interactions. Below are the three highlights of our VoxEval benchmark: - End-to-end speech-based eva...
| Entity Passport | |
| Registry ID | hf-dataset--keyng101--voxeval |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__keyng101__voxeval,
author = {KeyNG101},
title = {VoxEval Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/KeyNG101/VoxEval}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
âī¸ Nexus Index V2.0
đŦ Index Insight
FNI V2.0 for VoxEval: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).
Verification Authority
đī¸ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
đ Explore Full Dataset âđ§Ŧ Field Logic
Schema not yet indexed for this dataset.
Dataset Specification
license: cc-by-4.0
VoxEval
Github repository for paper: VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models
Also check out our survey paper at Recent Advances in Speech Language Models: A Survey!
VoxEval is a novel speech question-answering benchmark specifically designed to assess SLMs' knowledge understanding through purely speech-based interactions.
Below are the three highlights of our VoxEval benchmark:
- End-to-end speech-based evaluation: Both input and output are audio-based.
- Diverse audio conditions: VoxEval includes audio files featuring a variety of speakers, speaking styles, and audio qualities.
- Complex spoken evaluations: It supports advanced spoken assessments, including spoken math tasks.
Download Data
You can access our VoxEval Dataset Repository on đ¤ Hugging Face to directly download the dataset.
Below is the layout of the dataset folder. all_fewshot_examples folder contains the few shot examples for the evaluation. math_CoT_fewshot folder contains the few shot examples for evaluating the math subjects via Chain-of-Thought prompting.test folder contains the actual test data in VoxEval.
# âââ Root Folder
# â âââ all_fewshot_examples
# â âââ alloy (different speaker voice)
# â âââ abstract_algebra_4o (different subjects)
# â âââ XXX.mp3
# â âââ ...
# â âââ ...
# â âââ echo
# â âââ ...
# â âââ test
# â âââ alloy (different settings)
# â âââ abstract_algebra_4o (different subjects)
# â âââ XXX.mp3
# â âââ ...
# â âââ ...
# â âââ echo
# â âââ ...
Evaluation results of existing end-to-end Spoken Language Models
| SLMs | SpeechGPT | TWIST | SPIRIT-LM | Moshi | GLM-4-Voice |
|---|---|---|---|---|---|
| Speakers | |||||
| Alloy | 0.0001 | 0.0480 | 0.2084 | 0.1216 | 0.3763 |
| Echo | 0.0001 | 0.0558 | 0.2096 | 0.1221 | 0.3764 |
| Fable | 0.0000 | 0.0116 | 0.2084 | 0.1153 | 0.3642 |
| Nova | 0.0001 | 0.0332 | 0.2070 | 0.1298 | 0.3677 |
| Onyx | 0.0002 | 0.0275 | 0.1966 | 0.1192 | 0.3764 |
| Shimmer | 0.0000 | 0.0516 | 0.2076 | 0.1205 | 0.3815 |
| Speaking Styles | |||||
| Linguistic | 0.0001 | 0.0488 | 0.2044 | 0.1187 | 0.3643 |
| Speed | 0.0001 | 0.0503 | 0.1911 | 0.1013 | 0.3469 |
| Pitch | 0.0000 | 0.0544 | 0.1788 | 0.0609 | 0.3345 |
| Audio Qualities | |||||
| Noise | 0.0000 | 0.0368 | 0.1950 | 0.1018 | 0.3695 |
| Other Env Acoustics | 0.0001 | 0.0434 | 0.2019 | 0.1051 | 0.3728 |
| Underlying Text LMs | Llama-7B | Llama-7B | Llama-2-7B | Helium-7B | GLM-4-9B |
| Text MMLU | 0.3510 | 0.3510 | 0.4530 | 0.5430 | 0.7470 |
What's Next
- The VoxEval evaluation code will be released soon.
License
The dataset is licensed under the Creative Commons Attribution 4.0.
Citation
@article{cui2025voxeval,
title={VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models},
author={Cui, Wenqian and Jiao, Xiaoqi and Meng, Ziqiao and King, Irwin},
journal={arXiv preprint arXiv:2501.04962},
year={2025}
}
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
đĄī¸ Dataset Transparency Report
Verified data manifest for traceability and transparency.
đ Identity & Source
- id
- hf-dataset--keyng101--voxeval
- source
- huggingface
- author
- KeyNG101
- tags
- license:cc-by-4.0arxiv:2501.04962arxiv:2410.03751region:us
âī¸ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
đ Engagement & Metrics
- likes
- 0
- downloads
- 36,148
Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)