📊
Dataset

E Gmd

by Schism Audio hf-dataset--schism-audio--e-gmd
Free2AITools Nexus Index
59.8 Top 100%
S: Semantic 50
A: Authority 61
P: Popularity 51
R: Recency 96
Q: Quality 50
Tech Context
Vital Performance
0 DL / 30D
0.0%
Data Integrity 59.8 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--schism-audio--e-gmd
License CC-BY-4.0
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__schism_audio__e_gmd,
  author = {Schism Audio},
  title = {E Gmd Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/schism-audio/e-gmd}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
Schism Audio. (2026). E Gmd [Dataset]. Free2AITools. https://huggingface.co/datasets/schism-audio/e-gmd

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

âš–ī¸ Free2AITools Nexus Index V2.0

Semantic (S) 50
Authority (A) 61
Popularity (P) 51
Recency (R) 96
Quality (Q) 50

đŸ’Ŧ Index Insight

FNI V2.0 for E Gmd: Semantic (S:50), Authority (A:61), Popularity (P:51), Recency (R:96), Quality (Q:50).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
âŦ‡ī¸
Downloads
31,011

đŸ‘ī¸ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

đŸ§Ŧ Field Logic

đŸ§Ŧ

Schema not yet indexed for this dataset.

Dataset Specification

Expanded Groove MIDI Dataset (E-GMD)

This repository mirrors version 1.0.0 of Google's Expanded Groove MIDI Dataset (E-GMD) for access through the Hugging Face Hub.

E-GMD is a large dataset of human drum performances with audio recordings annotated in MIDI. It contains 444.5 hours of audio from 43 drum kits, with the same train, validation, and test split definitions as the original Groove MIDI Dataset.

Quick Start

python
from datasets import load_dataset

ds = load_dataset("schism-audio/e-gmd", split="train", streaming=True)
first = next(iter(ds))
print(first["audio"], first["midi_path"], first["split"])

Repository Layout

The original archive contains some session folders with more than 10,000 files, which exceeds Hugging Face Hub's per-folder repository limit. This mirror keeps the original filenames and original drummer/session paths, but stages files under split and kit directories:

text
audio/{split}/{kit_slug}/{original_drummer/session/path}.wav
midi/{split}/{kit_slug}/{original_drummer/session/path}.midi
metadata.csv
metadata/{split}.csv
metadata/all.csv
e-gmd-v1.0.0.csv

The root metadata.csv follows the AudioFolder convention and contains one row per WAV file. The file_name column points at the audio file, and the remaining columns preserve E-GMD metadata plus the paired MIDI path.

The metadata/*.csv files are retained from the original mirror and add these path columns:

  • file_name: audio path relative to the split folder
  • audio_path: audio path relative to the repository root
  • midi_path: paired MIDI path relative to the repository root
  • original_audio_filename: original archive audio path
  • original_midi_filename: original archive MIDI path

Loading Audio With Metadata

The default dataset is loadable with datasets through the AudioFolder convention:

python
from datasets import load_dataset

train = load_dataset("schism-audio/e-gmd", split="train", streaming=True)
print(train[0]["audio"], train[0]["midi_path"])

The official train/test/validation split is preserved in the split column. The paired MIDI file for each row is available through midi_path.

Splits

Split Unique sequences Total sequences Duration
Train 819 35,217 341.4 hours
Test 123 5,289 50.9 hours
Validation 117 5,031 52.2 hours
Total 1,059 45,537 444.5 hours

License

The dataset is made available by Google LLC under the Creative Commons Attribution 4.0 International license (CC BY 4.0).

Source

Citation

If you use this dataset, cite the original E-GMD paper and specify version 1.0.0:

bibtex
@misc{callender2020improving,
    title={Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset},
    author={Lee Callender and Curtis Hawthorne and Jesse Engel},
    year={2020},
    eprint={2004.00188},
    archivePrefix={arXiv},
    primaryClass={cs.SD}
}

📊 Structured Schema (Zero-Fabrication)

Feature Key Data Type
drummer string
session string
id string
style string
bpm int64
beat_type string
time_signature string
duration float64
split string
kit_name string
audio Audio
midi_path string
audio_path string
original_midi_filename string
original_audio_filename string

Estimated Rows: 45,537

Social Proof

HuggingFace Hub
31.0KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id
hf-dataset--schism-audio--e-gmd
slug
schism-audio--e-gmd
source
huggingface
author
Schism Audio
license
CC-BY-4.0
tags
task_categories:audio-classification, license:cc-by-4.0, size_categories:10k<n<100k, format:audiofolder, modality:audio, modality:text, library:datasets, library:mlcroissant, arxiv:2004.00188, region:us, midi, drums, percussion, drum-transcription, music-information-retrieval, automatic-drum-transcription

âš™ī¸ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

📊 Engagement & Metrics

downloads
31,011
stars
0
forks
null

Data indexed from public sources. Updated daily.