📊

Dataset

Syntheory Plus

Name: Syntheory Plus
Creator: derekxkwan
License: MIT

by derekxkwan hf-dataset--derekxkwan--syntheory_plus

Nexus Index

35.3 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 59

R: Recency 61

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 35.3 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--derekxkwan--syntheory_plus
License	MIT
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__derekxkwan__syntheory_plus,
  author = {derekxkwan},
  title = {Syntheory Plus Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/derekxkwan/syntheory_plus}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

derekxkwan. (2026). Syntheory Plus [Dataset]. Free2AITools. https://huggingface.co/datasets/derekxkwan/syntheory_plus

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

35.3

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 59

Recency (R) 61

Quality (Q) 30

💬 Index Insight

FNI V2.0 for Syntheory Plus: Semantic (S:50), Authority (A:0), Popularity (P:59), Recency (R:61), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

123,557

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

Status

(uploading/documentation in progress)

Dataset Authors

Derek Kwan and Patrick Donnelly

TBA

Dataset Creation

Rationale

This dataset was created to isolate specific Western music theoretical concepts in order to probe Meta's MusicGen and OpenAI's Jukebox music generative models. The experimentation and more importantly here, the dataset, were based on the work of Wei et al. in their creation of the Syntheory dataset.

Dataset Creation

This dataset was creation following the process outline in Wei et al.'s Do Music Generation Models Encode Music Theory?. From programmatically generated MIDI files, we used the rustysynth library with the TimGM6mb soundfont.

Dataset Overview

Dataset Descriptions

Polyrhythms

Dataset Fields

Polyrhythms

name (str): The filename of the musical example (without the .wav extension)
inst1 (int): The name of the first MIDI instrument used to realize the musical example
inst2 (int): The name of the second MIDI instrument used to realize the musical example
bpm (int): The BPM (beats per measure) the musical example is realized at
num_bars (int): The number of 4/4 bars contained in the musical example. Each iteration of a polyrhythm occurs over the duration of one bar
poly (str): The polyrhythm of the musical example in the format {first_rate}a{second_rate} (a for "against"). For example, 4a5 denotes "4 against 5"
pair (str): The pair of MIDI instruments used in the musical example in the format {inst1}_{inst2}
ratio (float): The ratio of the rates in the polyrhythm achieved by first_rate/second_rate. For example, 4a5 has a ratio of 0.8
norm_ratio (float): The ratio of the rates mapping the smallest ratio of the dataset to 0 and the largest to 1. This is achieved with the expression ((ratio) - (smallest_ratio))/((largest_ratio)-(smallest_ratio))
offset_lvl (int): Denotes an offset in starting time of the musical example in the WAV file. There are five possible 0-indexed offsets with values in milliseconds 0, 120, 204, 700, and 1067. These values were generated using a Numpy random number generator.
offset_ms (int): The milliseconds in offset as described by offset_lvl.
rvb_lvl (int): One of three levels of reverb applied to the MIDI realization. These are 0-indexed and correspond to MIDI values of 0, 63, and 127.
rvb_val (int): The MIDI value of reverb as described in rvb_lvl
poly1: The relative rate of the first MIDI instrument in the musical example
poly2: The relative rate of the second MIDI instrument in the musical example
polydist: The absolute value integer distance between the rates of the two instruments. For example, the polyrhythm of "5 against 11" has distance 6.
label_idx: Each polyrhythm is mapped to a index (0-indexed) for downstream classification tasks.
fold (int): The fold the musical example belongs in (1-indexed). The musical example can be found in the sub-dataset's fold_{fold} subfolder and there are 20 folds.
set_type (str): Denotes if the musical example's fold is meant for training (train), testing (test), or validation (valid). For the original paper experiments, the first 14 folds are assigned to train, the next three to valid, and the next three to test.

Dynamics

name (str): The filename of the musical example (without the .wav extension)
dyn1: One of two dynamics characterizing a dynamic expressive pattern. In non-flat patterns, this is the softer of two dynamics. In flat patterns, this is the same as dyn2.
dyn2: One of two dynamics characterizing a dynamic expressive pattern. In non-flat patterns, this is the louder of two dynamics. In flat patterns, this is the same as dyn1.
dyn_pair: A string formatted as {dyn1}-{dyn2}
inst (str): The name of the MIDI instrument used to realize the musical example
inflection_point: The beat of the 4/4 (1-indexed) bar where the dynamic direction changes for subp (drops in dynamic), subf (increases in dynamic), hairpin (changes from crescendo to decrescendo), and revhairpin (changes from decrescendo to crescendo)
dyn_category: The type of dynamic expressive pattern (flat, decresc, cresc, subp, subf, hairpin, revhairpin)
dyn_subcategory: The type of dynamic expressive pattern with inflection point appended with a - where applicable. For example, subf with an inflection point on beat 3 becomes subf-3.
rvb_lvl (int): One of three levels of reverb applied to the MIDI realization. These are 0-indexed and correspond to MIDI values of 0, 63, and 127.
rvb_val (int): The MIDI value of reverb as described in rvb_lvl
offset_lvl (int): Denotes an offset in starting time of the musical example in the WAV file. There are three possible 0-indexed offsets with values in milliseconds 0, 264, and 431. These values were generated using a Numpy random number generator.
offset_ms (int): The milliseconds in offset as described by offset_lvl.
beats_per_bar (int): Number of beats per bar in a musical example. Each bar is in 4/4 time so this value of 4 for all musical examples.
num_bars (int): Number of bars in each musical example. This is 1 for all musical examples.
beat_subdiv (int): Denotes the number of equally-sized subdivisions that make up the stream of notes realizing each musical example. This value ranges from 2 (eighth-notes) to 8 (32nd notes).
bpm (int): The BPM that a musical example is realized at. This is 60 BPM for all musical examples in this sub-dataset.
num_beats (int): Number of beats in a musical example. As each musical example is at 60 BPM, this is 4 beats for all examples.
fold (int): The fold the musical example belongs in (1-indexed). The musical example can be found in the sub-dataset's fold_{fold} subfolder and there are 20 folds.
set_type (str): Denotes if the musical example's fold is meant for training (train), testing (test), or validation (valid). For the original paper experiments, the first 14 folds are assigned to train, the next three to valid, and the next three to test.

Seventh Chords

name (str): The filename of the musical example (without the .wav extension)
inst (str): The name of the MIDI instrument used to realize the musical example
root (str): The root of the seventh chord alongside its octave, with "s" denoting sharp. For example, "c4" is middle C while "as3" is the A# below that.
pitch (str): The root of the seventh chord without its octave (just the pitch).
octave (int): The octave of the root.
quality (str): The quality of the seventh chord
quality_idx (int): The quality of the seventh chord mapped to an index (starting with 0) for classification tasks.
inv (int): The inversion of the seventh chord (0-indexed where root inversion is 0).
bpm (int): The BPM a musical example is realized at. For all examples, this is 60 BPM.
fold (int): The fold the musical example belongs in (1-indexed). The musical example can be found in the sub-dataset's fold_{fold} subfolder and there are 20 folds.
set_type (str): Denotes if the musical example's fold is meant for training (train), testing (test), or validation (valid). For the original paper experiments, the first 14 folds are assigned to train, the next three to valid, and the next three to test.

Mode Mixture

name (str): The filename of the musical example (without the .wav extension)
inst (str): The name of the MIDI instrument used to realize the musical example
key_center (str): The pitch key center of a chord progrssion alongside its octave, with "s" denoting sharp. For example, "c4" is middle C while "as3" is the A# below that.
scale_type (str): The primary mode that each example is realized in. As we only focus on the major mode, each musical example has the maj scale type.
is_modemix (bool): false for no mode mixture in chord progression, true for mode mixture
orig_prog (str): The roman numerals of each chord in integer scale degrees prefixed by the original chord progression's mode without mode mixture. For example, both major and mode mixture variants of "I-vi-ii-V" would have maj-1625 in this field.
sub_prog (str): The same as orig_prog except mode mixture variants are now prefixed with mm1. Thus, the major variants of "I-vi-ii-V" would have maj-1625 in this field while the mode mixture variant would have mm1-1625.
inv (int): The inversion (0-indexed) that each chord is in. As each chord is in root position for all musical examples, this is 0 for all.
bpm (int): The BPM that each musical example is realized at. For all musical examples, this is 60 BPM.
fold (int): The fold the musical example belongs in (1-indexed). The musical example can be found in the sub-dataset's fold_{fold} subfolder and there are 20 folds.
set_type (str): Denotes if the musical example's fold is meant for training (train), testing (test), or validation (valid). For the original paper experiments, the first 14 folds are assigned to train, the next three to valid, and the next three to test.

Secondary Dominants

name (str): The filename of the musical example (without the .wav extension)
inst (str): The name of the MIDI instrument used to realize the musical example
key_center (str): The pitch key center of a chord progrssion alongside its octave, with "s" denoting sharp. For example, "c4" is middle C while "as3" is the A# below that.
scale_type (str): The primary mode that each example is realized in. Major chord progressions are maj while minor chord progressions are min.
sub_type (str): The type of chord progression that characterizes each musical example. N denotes a diatonic chord progression, S denotes a chord progression with a chord replaced by the secondary dominant of the following chord, and T deontes a similar replacement but with the tritone substitution of a secondary dominant.
scale_sub_type (str): A string with the scale_type concatenated with the sub_type as {scale_type}_{sub_type}
base_prog (str): The roman numerals of each chord in integer scale degrees.
orig_prog (str): The base_prog string prefixed by the scale_type. This results in a string denoting the chord progression and the mode it occurs in. For example, "i7-VI(maj7)-ii(halfdim7)-V7" in minor is min-1625.
inv (int): The inversion (0-indexed) that each chord is in. As each chord is in root position for all musical examples, this is 0 for all.
sub_prog (str): The orig_prog string suffixed by the scale_sub_type. For example, a tritone substitution for "i7-VI(maj7)-ii(halfdim7)-V7" in minor is min-1625-T
sub_pos (int): The position in the chord progression that a chord substitution occurs (1-indexed). For example, each chord progression that has a substitution in this sub-dataset substitutes the 2nd out of 4 chords and thus have the value 2. A chord progression where no substitution occurs (N types) have the value -1.
bpm (int): The BPM a musical example is realized at. For all examples, this is 60 BPM.
fold (int): The fold the musical example belongs in (1-indexed). The musical example can be found in the sub-dataset's fold_{fold} subfolder and there are 20 folds.
set_type (str): Denotes if the musical example's fold is meant for training (train), testing (test), or validation (valid). For the original paper experiments, the first 14 folds are assigned to train, the next three to valid, and the next three to test.

Social Proof

HuggingFace Hub

123.6KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--derekxkwan--syntheory_plus
slug: derekxkwan--syntheory_plus
source: huggingface
author: derekxkwan
license: MIT
tags: license:mit, size_categories:100k<n<1m, region:us, music

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 123,557
stars: 0
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!

Cite this dataset

🔬Technical Deep Dive

⚖️ Nexus Index V2.0

💬 Index Insight

Verification Authority

👁️ Data Preview

🧬 Field Logic

Dataset Specification

Status

Dataset Authors

Related Paper

Dataset Creation

Rationale

Dataset Creation

Dataset Overview

Dataset Descriptions

Polyrhythms

Dataset Fields

Polyrhythms

Dynamics

Seventh Chords

Mode Mixture

Secondary Dominants

Social Proof

🛡️ Dataset Transparency Report

🆔 Identity & Source

⚙️ Technical Specs

📊 Engagement & Metrics