📊
Dataset

yodas2

by espnet hf-dataset--espnet--yodas2
Nexus Index
36.0 Top 0%
P / V / C / U Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context
Vital Performance
0 DL / 30D
0.0%

YODAS2 is the long-form dataset from YODAS dataset. It provides the same dataset as espnet/yodas but YODAS2 has the following new features: - formatted in the long-form (video-level) where audios are not segmented. - audios are encoded using higher sampling rates (i.e. 24k) For detailed information about YODAS dataset, please refer to our paper and the espnet/yodas repo. Each data point corresponds to an entire video on YouTube, it contains the following fields: - v...

Data Integrity 36 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--espnet--yodas2
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__espnet__yodas2,
  author = {espnet},
  title = {yodas2 Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/espnet/yodas2}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
espnet. (2026). yodas2 [Dataset]. Free2AITools. https://huggingface.co/datasets/espnet/yodas2

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

âš–ī¸ Nexus Index V16.5

36.0
ESTIMATED IMPACT TIER
Popularity (P) 0
Freshness (F) 0
Completeness (C) 0
Utility (U) 0

đŸ’Ŧ Index Insight

The Free2AITools Nexus Index for yodas2 aggregates Popularity (P:0), Freshness (F:0), and Completeness (C:0). The Utility score (U:0) represents deployment readiness and ecosystem adoption.

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
âŦ‡ī¸
Downloads
55,573
â¤ī¸
Likes
45

đŸ‘ī¸ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

đŸ§Ŧ Field Logic

đŸ§Ŧ

Schema not yet indexed for this dataset.

Dataset Specification


license: cc-by-3.0

YODAS2 is the long-form dataset from YODAS dataset.

It provides the same dataset as espnet/yodas but YODAS2 has the following new features:

  • formatted in the long-form (video-level) where audios are not segmented.
  • audios are encoded using higher sampling rates (i.e. 24k)

For detailed information about YODAS dataset, please refer to our paper and the espnet/yodas repo.

Usage:

Each data point corresponds to an entire video on YouTube, it contains the following fields:

  • video_id: unique id of this video (note this id is not the video_id in Youtube)
  • duration: total duration in seconds of this video
  • audio
    • path: local path to wav file if in standard mode, otherwise empty in the streaming mode
    • sampling_rate: fixed to be 24k. (note that the sampling rate in espnet/yodas is 16k)
    • array: wav samples in float
  • utterances
    • utt_id: unique id of this utterance
    • text: transcription of this utterance
    • start: start timestamp in seconds of this utterance
    • end: end timestamp in seconds of this utterance

YODAS2 also supports two modes:

standard mode: each subset will be downloaded to the local dish before first iterating.

from datasets import load_dataset

Note this will take very long time to download and preprocess

you can try small subset for testing purpose

ds = load_dataset('espnet/yodas2', 'en000') print(next(iter(ds['train'])))

streaming mode most of the files will be streamed instead of downloaded to your local deivce. It can be used to inspect this dataset quickly.

from datasets import load_dataset

this streaming loading will finish quickly

ds = load_dataset('espnet/yodas2', 'en000', streaming=True)

Reference

@inproceedings{li2023yodas,
  title={Yodas: Youtube-Oriented Dataset for Audio and Speech},
  author={Li, Xinjian and Takamichi, Shinnosuke and Saeki, Takaaki and Chen, William and Shiota, Sayaka and Watanabe, Shinji},
  booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  pages={1--8},
  year={2023},
  organization={IEEE}
}

Contact

If you have any questions, feel free to contact us at the following email address.

We made sure that our dataset only consisted of videos with CC licenses during our downloading. But in case you find your video unintentionally included in our dataset and would like to delete it, you can send a delete request to the following email.

Remove the parenthesis () from the following email address

(lixinjian)(1217)@gmail.com

Top Tier

Social Proof

HuggingFace Hub
45Likes
55.6KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id
hf-dataset--espnet--yodas2
source
huggingface
author
espnet
tags
license:cc-by-3.0arxiv:2406.00899region:us

âš™ī¸ Technical Specs

architecture
null
params billions
null
context length
null

📊 Engagement & Metrics

likes
45
downloads
55,573

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)