📊
Dataset

AudioSet

by agkphysics hf-dataset--agkphysics--audioset
Nexus Index
34.5 Top 100%
S: Semantic 50
A: Authority 0
P: Popularity 54
R: Recency 38
Q: Quality 30
Tech Context
Vital Performance
0 DL / 30D
0.0%
Data Integrity 34.5 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--agkphysics--audioset
License CC-BY-4.0
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__agkphysics__audioset,
  author = {agkphysics},
  title = {AudioSet Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/agkphysics/audioset}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
agkphysics. (2026). AudioSet [Dataset]. Free2AITools. https://huggingface.co/datasets/agkphysics/audioset

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

âš–ī¸ Nexus Index V2.0

34.5
TOP 100% SYSTEM IMPACT
Semantic (S) 50
Authority (A) 0
Popularity (P) 54
Recency (R) 38
Quality (Q) 30

đŸ’Ŧ Index Insight

FNI V2.0 for AudioSet: Semantic (S:50), Authority (A:0), Popularity (P:54), Recency (R:38), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
âŦ‡ī¸
Downloads
46,091

đŸ‘ī¸ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

đŸ§Ŧ Field Logic

đŸ§Ŧ

Schema not yet indexed for this dataset.

Dataset Specification

Dataset Card for AudioSet

Dataset Description

Dataset Summary

AudioSet is a dataset of 10-second clips from YouTube, annotated into one or more sound categories, following the AudioSet ontology.

Supported Tasks and Leaderboards

  • audio-classification: Classify audio clips into categories. The leaderboard is available here

Languages

The class labels in the dataset are in English.

Dataset Structure

Data Instances

Example instance from the dataset:

python
{
 'video_id': '--PJHxphWEs',
 'audio': {
  'path': 'audio/bal_train/--PJHxphWEs.flac',
  'array': array([-0.04364824, -0.05268681, -0.0568949 , ...,  0.11446512,
          0.14912748,  0.13409865]),
  'sampling_rate': 48000
 },
 'labels': ['/m/09x0r', '/t/dd00088'],
 'human_labels': ['Speech', 'Gush']
}

Data Fields

Instances have the following fields:

  • video_id: a string feature containing the original YouTube ID.
  • audio: an Audio feature containing the audio data and sample rate.
  • labels: a sequence of string features containing the labels associated with the audio clip.
  • human_labels: a sequence of string features containing the human-readable forms of the same labels as in labels.

Data Splits

The distribuion of audio clips is as follows:

`balanced` configuration

train test
# instances 18683 17141

`unbalanced` configuration

train test
# instances 1738657 17141

Dataset Creation

Curation Rationale

More Information Needed

Source Data

Initial Data Collection and Normalization

More Information Needed

Who are the source language producers?

The labels are from the AudioSet ontology. Audio clips are from YouTube.

Annotations

Annotation process

More Information Needed

Who are the annotators?

More Information Needed

Personal and Sensitive Information

More Information Needed

Considerations for Using the Data

Social Impact of Dataset

More Information Needed

Discussion of Biases

More Information Needed

Other Known Limitations

  1. The YouTube videos in this copy of AudioSet were downloaded in March 2023, so not all of the original audios are available. The number of clips able to be downloaded is as follows:
    • Balanced train: 18683 audio clips out of 22160 originally.
    • Unbalanced train: 1738788 clips out of 2041789 originally.
    • Evaluation: 17141 audio clips out of 20371 originally.
  2. Most audio is sampled at 48 kHz 24 bit, but about 10% is sampled at 44.1 kHz 24 bit. Audio files are stored in the FLAC format.

Additional Information

Dataset Curators

More Information Needed

Licensing Information

The AudioSet data is licensed under CC-BY-4.0

Citation

bibtex
@inproceedings{jort_audioset_2017,
    title	= {Audio Set: An ontology and human-labeled dataset for audio events},
    author	= {Jort F. Gemmeke and Daniel P. W. Ellis and Dylan Freedman and Aren Jansen and Wade Lawrence and R. Channing Moore and Manoj Plakal and Marvin Ritter},
    year	= {2017},
    booktitle	= {Proc. IEEE ICASSP 2017},
    address	= {New Orleans, LA}
}

📊 Structured Schema (Zero-Fabrication)

Feature Key Data Type
video_id string
audio Audio
labels unknown
human_labels unknown

Estimated Rows: 35,824

Social Proof

HuggingFace Hub
46.1KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id
hf-dataset--agkphysics--audioset
slug
agkphysics--audioset
source
huggingface
author
agkphysics
license
CC-BY-4.0
tags
task_categories:audio-classification, source_datasets:original, language:en, license:cc-by-4.0, size_categories:1m<n<10m, format:parquet, modality:audio, modality:text, library:datasets, library:dask, library:mlcroissant, library:polars, region:us, audio

âš™ī¸ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

📊 Engagement & Metrics

downloads
46,091
stars
94
forks
0

Data indexed from public sources. Updated daily.