📊

Dataset

VoxEval

Name: VoxEval
Creator: KeyNG101
License: CC-BY-4.0

by KeyNG101 hf-dataset--keyng101--voxeval

Nexus Index

34.0 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 62

R: Recency 53

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 34 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--keyng101--voxeval
License	CC-BY-4.0
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__keyng101__voxeval,
  author = {KeyNG101},
  title = {VoxEval Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/keyng101/voxeval}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

KeyNG101. (2026). VoxEval [Dataset]. Free2AITools. https://huggingface.co/datasets/keyng101/voxeval

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

34.0

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 62

Recency (R) 53

Quality (Q) 30

💬 Index Insight

FNI V2.0 for VoxEval: Semantic (S:50), Authority (A:0), Popularity (P:62), Recency (R:53), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

214,022

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

VoxEval

Github repository for paper: VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models

Also check out our survey paper at Recent Advances in Speech Language Models: A Survey!

VoxEval is a novel speech question-answering benchmark specifically designed to assess SLMs' knowledge understanding through purely speech-based interactions.

Below are the three highlights of our VoxEval benchmark:

End-to-end speech-based evaluation: Both input and output are audio-based.
Diverse audio conditions: VoxEval includes audio files featuring a variety of speakers, speaking styles, and audio qualities.
Complex spoken evaluations: It supports advanced spoken assessments, including spoken math tasks.

Download Data

You can access our VoxEval Dataset Repository on 🤗 Hugging Face to directly download the dataset.

Below is the layout of the dataset folder. all_fewshot_examples folder contains the few shot examples for the evaluation. math_CoT_fewshot folder contains the few shot examples for evaluating the math subjects via Chain-of-Thought prompting. test folder contains the actual test data in VoxEval.

text

  # ├── Root Folder
  # │   ├── all_fewshot_examples
  # │       ├── alloy (different speaker voice)
  # │           ├── abstract_algebra_4o (different subjects)
  # │               ├── XXX.mp3
  # │               ├── ...
  # │           ├── ...
  # │       ├── echo
  # │       ├── ...
  # │   ├── test
  # │       ├── alloy (different settings)
  # │           ├── abstract_algebra_4o (different subjects)
  # │               ├── XXX.mp3
  # │               ├── ...
  # │           ├── ...
  # │       ├── echo
  # │       ├── ...

Evaluation results of existing end-to-end Spoken Language Models

SLMs	SpeechGPT	TWIST	SPIRIT-LM	Moshi	GLM-4-Voice
Speakers
Alloy	0.0001	0.0480	0.2084	0.1216	0.3763
Echo	0.0001	0.0558	0.2096	0.1221	0.3764
Fable	0.0000	0.0116	0.2084	0.1153	0.3642
Nova	0.0001	0.0332	0.2070	0.1298	0.3677
Onyx	0.0002	0.0275	0.1966	0.1192	0.3764
Shimmer	0.0000	0.0516	0.2076	0.1205	0.3815
Speaking Styles
Linguistic	0.0001	0.0488	0.2044	0.1187	0.3643
Speed	0.0001	0.0503	0.1911	0.1013	0.3469
Pitch	0.0000	0.0544	0.1788	0.0609	0.3345
Audio Qualities
Noise	0.0000	0.0368	0.1950	0.1018	0.3695
Other Env Acoustics	0.0001	0.0434	0.2019	0.1051	0.3728
Underlying Text LMs	Llama-7B	Llama-7B	Llama-2-7B	Helium-7B	GLM-4-9B
Text MMLU	0.3510	0.3510	0.4530	0.5430	0.7470

What's Next

The VoxEval evaluation code will be released soon.

License

The dataset is licensed under the Creative Commons Attribution 4.0.

Citation

text

@article{cui2025voxeval,
  title={VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models},
  author={Cui, Wenqian and Jiao, Xiaoqi and Meng, Ziqiao and King, Irwin},
  journal={arXiv preprint arXiv:2501.04962},
  year={2025}
}

Social Proof

HuggingFace Hub

214.0KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--keyng101--voxeval
slug: keyng101--voxeval
source: huggingface
author: KeyNG101
license: CC-BY-4.0
tags: license:cc-by-4.0, arxiv:2501.04962, arxiv:2410.03751, region:us

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 214,022
stars: 0
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!

Cite this dataset

🔬Technical Deep Dive

⚖️ Nexus Index V2.0

💬 Index Insight

Verification Authority

👁️ Data Preview

🧬 Field Logic

Dataset Specification

VoxEval

Download Data

Evaluation results of existing end-to-end Spoken Language Models

What's Next

License

Citation

Social Proof

🛡️ Dataset Transparency Report

🆔 Identity & Source

⚙️ Technical Specs

📊 Engagement & Metrics