📊

Dataset

yodas2

by espnet hf-dataset--espnet--yodas2

Nexus Index

36.0 Top 0%

P / V / C / U Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context

Vital Performance

0 DL / 30D

0.0%

YODAS2 is the long-form dataset from YODAS dataset. It provides the same dataset as espnet/yodas but YODAS2 has the following new features: - formatted in the long-form (video-level) where audios are not segmented. - audios are encoded using higher sampling rates (i.e. 24k) For detailed information about YODAS dataset, please refer to our paper and the espnet/yodas repo. Each data point corresponds to an entire video on YouTube, it contains the following fields: - v...

Source →

Data Integrity 36 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--espnet--yodas2
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__espnet__yodas2,
  author = {espnet},
  title = {yodas2 Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/espnet/yodas2}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

espnet. (2026). yodas2 [Dataset]. Free2AITools. https://huggingface.co/datasets/espnet/yodas2

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V16.5

Methodology Index Protocol

36.0

ESTIMATED IMPACT TIER

Popularity (P) 0

Freshness (F) 0

Completeness (C) 0

Utility (U) 0

💬 Index Insight

The Free2AITools Nexus Index for yodas2 aggregates Popularity (P:0), Freshness (F:0), and Completeness (C:0). The Utility score (U:0) represents deployment readiness and ecosystem adoption.

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

55,573

❤️

Likes

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

license: cc-by-3.0

YODAS2 is the long-form dataset from YODAS dataset.

It provides the same dataset as espnet/yodas but YODAS2 has the following new features:

formatted in the long-form (video-level) where audios are not segmented.
audios are encoded using higher sampling rates (i.e. 24k)

For detailed information about YODAS dataset, please refer to our paper and the espnet/yodas repo.

Usage:

Each data point corresponds to an entire video on YouTube, it contains the following fields:

video_id: unique id of this video (note this id is not the video_id in Youtube)
duration: total duration in seconds of this video
audio
- path: local path to wav file if in standard mode, otherwise empty in the streaming mode
- sampling_rate: fixed to be 24k. (note that the sampling rate in espnet/yodas is 16k)
- array: wav samples in float
utterances
- utt_id: unique id of this utterance
- text: transcription of this utterance
- start: start timestamp in seconds of this utterance
- end: end timestamp in seconds of this utterance

YODAS2 also supports two modes:

standard mode: each subset will be downloaded to the local dish before first iterating.

from datasets import load_dataset

Note this will take very long time to download and preprocess
you can try small subset for testing purposeds = load_dataset('espnet/yodas2', 'en000')
print(next(iter(ds['train'])))

streaming mode most of the files will be streamed instead of downloaded to your local deivce. It can be used to inspect this dataset quickly.

from datasets import load_dataset

this streaming loading will finish quicklyds = load_dataset('espnet/yodas2', 'en000', streaming=True)

Reference

@inproceedings{li2023yodas,
  title={Yodas: Youtube-Oriented Dataset for Audio and Speech},
  author={Li, Xinjian and Takamichi, Shinnosuke and Saeki, Takaaki and Chen, William and Shiota, Sayaka and Watanabe, Shinji},
  booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  pages={1--8},
  year={2023},
  organization={IEEE}
}

Contact

If you have any questions, feel free to contact us at the following email address.

We made sure that our dataset only consisted of videos with CC licenses during our downloading. But in case you find your video unintentionally included in our dataset and would like to delete it, you can send a delete request to the following email.

Remove the parenthesis () from the following email address

(lixinjian)(1217)@gmail.com

Top Tier

Social Proof

HuggingFace Hub

45Likes

55.6KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id: hf-dataset--espnet--yodas2
source: huggingface
author: espnet
tags: license:cc-by-3.0arxiv:2406.00899region:us

⚙️ Technical Specs

architecture: null
params billions: null
context length: null

📊 Engagement & Metrics

likes: 45
downloads: 55,573

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!