The YouTube videos in this copy of AudioSet were downloaded in March 2023, so not all of the original audios are available. The number of clips able to be downloaded is as follows:
Balanced train: 18683 audio clips out of 22160 originally.
Unbalanced train: 1738788 clips out of 2041789 originally.
Evaluation: 17141 audio clips out of 20371 originally.
Most audio is sampled at 48 kHz 24 bit, but about 10% is sampled at 44.1 kHz 24 bit. Audio files are stored in the FLAC format.
@inproceedings{jort_audioset_2017,
title = {Audio Set: An ontology and human-labeled dataset for audio events},
author = {Jort F. Gemmeke and Daniel P. W. Ellis and Dylan Freedman and Aren Jansen and Wade Lawrence and R. Channing Moore and Manoj Plakal and Marvin Ritter},
year = {2017},
booktitle = {Proc. IEEE ICASSP 2017},
address = {New Orleans, LA}
}