Syntheory Plus
| Entity Passport | |
| Registry ID | hf-dataset--derekxkwan--syntheory_plus |
| License | MIT |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__derekxkwan__syntheory_plus,
author = {derekxkwan},
title = {Syntheory Plus Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/derekxkwan/syntheory_plus}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
âī¸ Nexus Index V2.0
đŦ Index Insight
FNI V2.0 for Syntheory Plus: Semantic (S:50), Authority (A:0), Popularity (P:59), Recency (R:61), Quality (Q:30).
Verification Authority
đī¸ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
đ Explore Full Dataset âđ§Ŧ Field Logic
Schema not yet indexed for this dataset.
Dataset Specification
Status
(uploading/documentation in progress)
Dataset Authors
Derek Kwan and Patrick Donnelly
Related Paper
TBA
Dataset Creation
Rationale
This dataset was created to isolate specific Western music theoretical concepts in order to probe Meta's MusicGen and OpenAI's Jukebox music generative models. The experimentation and more importantly here, the dataset, were based on the work of Wei et al. in their creation of the Syntheory dataset.
Dataset Creation
This dataset was creation following the process outline in Wei et al.'s Do Music Generation Models Encode Music Theory?. From programmatically generated MIDI files, we used the rustysynth library with the TimGM6mb soundfont.
Dataset Overview
Dataset Descriptions
Polyrhythms
Dataset Fields
Polyrhythms
name(str): The filename of the musical example (without the.wavextension)inst1(int): The name of the first MIDI instrument used to realize the musical exampleinst2(int): The name of the second MIDI instrument used to realize the musical examplebpm(int): The BPM (beats per measure) the musical example is realized atnum_bars(int): The number of 4/4 bars contained in the musical example. Each iteration of a polyrhythm occurs over the duration of one barpoly(str): The polyrhythm of the musical example in the format{first_rate}a{second_rate}(afor "against"). For example,4a5denotes "4 against 5"pair(str): The pair of MIDI instruments used in the musical example in the format{inst1}_{inst2}ratio(float): The ratio of the rates in the polyrhythm achieved byfirst_rate/second_rate. For example,4a5has a ratio of0.8norm_ratio(float): The ratio of the rates mapping the smallest ratio of the dataset to0and the largest to1. This is achieved with the expression((ratio) - (smallest_ratio))/((largest_ratio)-(smallest_ratio))offset_lvl(int): Denotes an offset in starting time of the musical example in the WAV file. There are five possible 0-indexed offsets with values in milliseconds 0, 120, 204, 700, and 1067. These values were generated using a Numpy random number generator.offset_ms(int): The milliseconds in offset as described byoffset_lvl.rvb_lvl(int): One of three levels of reverb applied to the MIDI realization. These are 0-indexed and correspond to MIDI values of 0, 63, and 127.rvb_val(int): The MIDI value of reverb as described inrvb_lvlpoly1: The relative rate of the first MIDI instrument in the musical examplepoly2: The relative rate of the second MIDI instrument in the musical examplepolydist: The absolute value integer distance between the rates of the two instruments. For example, the polyrhythm of "5 against 11" has distance 6.label_idx: Each polyrhythm is mapped to a index (0-indexed) for downstream classification tasks.fold(int): The fold the musical example belongs in (1-indexed). The musical example can be found in the sub-dataset'sfold_{fold}subfolder and there are 20 folds.set_type(str): Denotes if the musical example's fold is meant for training (train), testing (test), or validation (valid). For the original paper experiments, the first 14 folds are assigned totrain, the next three tovalid, and the next three totest.
Dynamics
name(str): The filename of the musical example (without the.wavextension)dyn1: One of two dynamics characterizing a dynamic expressive pattern. In non-flatpatterns, this is the softer of two dynamics. Inflatpatterns, this is the same asdyn2.dyn2: One of two dynamics characterizing a dynamic expressive pattern. In non-flatpatterns, this is the louder of two dynamics. Inflatpatterns, this is the same asdyn1.dyn_pair: A string formatted as{dyn1}-{dyn2}inst(str): The name of the MIDI instrument used to realize the musical exampleinflection_point: The beat of the 4/4 (1-indexed) bar where the dynamic direction changes forsubp(drops in dynamic),subf(increases in dynamic),hairpin(changes from crescendo to decrescendo), andrevhairpin(changes from decrescendo to crescendo)dyn_category: The type of dynamic expressive pattern (flat,decresc,cresc,subp,subf,hairpin,revhairpin)dyn_subcategory: The type of dynamic expressive pattern with inflection point appended with a - where applicable. For example,subfwith an inflection point on beat 3 becomessubf-3.rvb_lvl(int): One of three levels of reverb applied to the MIDI realization. These are 0-indexed and correspond to MIDI values of 0, 63, and 127.rvb_val(int): The MIDI value of reverb as described inrvb_lvloffset_lvl(int): Denotes an offset in starting time of the musical example in the WAV file. There are three possible 0-indexed offsets with values in milliseconds 0, 264, and 431. These values were generated using a Numpy random number generator.offset_ms(int): The milliseconds in offset as described byoffset_lvl.beats_per_bar(int): Number of beats per bar in a musical example. Each bar is in 4/4 time so this value of 4 for all musical examples.num_bars(int): Number of bars in each musical example. This is 1 for all musical examples.beat_subdiv(int): Denotes the number of equally-sized subdivisions that make up the stream of notes realizing each musical example. This value ranges from 2 (eighth-notes) to 8 (32nd notes).bpm(int): The BPM that a musical example is realized at. This is 60 BPM for all musical examples in this sub-dataset.num_beats(int): Number of beats in a musical example. As each musical example is at 60 BPM, this is 4 beats for all examples.fold(int): The fold the musical example belongs in (1-indexed). The musical example can be found in the sub-dataset'sfold_{fold}subfolder and there are 20 folds.set_type(str): Denotes if the musical example's fold is meant for training (train), testing (test), or validation (valid). For the original paper experiments, the first 14 folds are assigned totrain, the next three tovalid, and the next three totest.
Seventh Chords
name(str): The filename of the musical example (without the.wavextension)inst(str): The name of the MIDI instrument used to realize the musical exampleroot(str): The root of the seventh chord alongside its octave, with "s" denoting sharp. For example, "c4" is middle C while "as3" is the A# below that.pitch(str): The root of the seventh chord without its octave (just the pitch).octave(int): The octave of the root.quality(str): The quality of the seventh chordquality_idx(int): The quality of the seventh chord mapped to an index (starting with 0) for classification tasks.inv(int): The inversion of the seventh chord (0-indexed where root inversion is 0).bpm(int): The BPM a musical example is realized at. For all examples, this is 60 BPM.fold(int): The fold the musical example belongs in (1-indexed). The musical example can be found in the sub-dataset'sfold_{fold}subfolder and there are 20 folds.set_type(str): Denotes if the musical example's fold is meant for training (train), testing (test), or validation (valid). For the original paper experiments, the first 14 folds are assigned totrain, the next three tovalid, and the next three totest.
Mode Mixture
name(str): The filename of the musical example (without the.wavextension)inst(str): The name of the MIDI instrument used to realize the musical examplekey_center(str): The pitch key center of a chord progrssion alongside its octave, with "s" denoting sharp. For example, "c4" is middle C while "as3" is the A# below that.scale_type(str): The primary mode that each example is realized in. As we only focus on the major mode, each musical example has themajscale type.is_modemix(bool):falsefor no mode mixture in chord progression,truefor mode mixtureorig_prog(str): The roman numerals of each chord in integer scale degrees prefixed by the original chord progression's mode without mode mixture. For example, both major and mode mixture variants of "I-vi-ii-V" would havemaj-1625in this field.sub_prog(str): The same asorig_progexcept mode mixture variants are now prefixed withmm1. Thus, the major variants of "I-vi-ii-V" would havemaj-1625in this field while the mode mixture variant would havemm1-1625.inv(int): The inversion (0-indexed) that each chord is in. As each chord is in root position for all musical examples, this is 0 for all.bpm(int): The BPM that each musical example is realized at. For all musical examples, this is 60 BPM.fold(int): The fold the musical example belongs in (1-indexed). The musical example can be found in the sub-dataset'sfold_{fold}subfolder and there are 20 folds.set_type(str): Denotes if the musical example's fold is meant for training (train), testing (test), or validation (valid). For the original paper experiments, the first 14 folds are assigned totrain, the next three tovalid, and the next three totest.
Secondary Dominants
name(str): The filename of the musical example (without the.wavextension)inst(str): The name of the MIDI instrument used to realize the musical examplekey_center(str): The pitch key center of a chord progrssion alongside its octave, with "s" denoting sharp. For example, "c4" is middle C while "as3" is the A# below that.scale_type(str): The primary mode that each example is realized in. Major chord progressions aremajwhile minor chord progressions aremin.sub_type(str): The type of chord progression that characterizes each musical example.Ndenotes a diatonic chord progression,Sdenotes a chord progression with a chord replaced by the secondary dominant of the following chord, andTdeontes a similar replacement but with the tritone substitution of a secondary dominant.scale_sub_type(str): A string with thescale_typeconcatenated with thesub_typeas{scale_type}_{sub_type}base_prog(str): The roman numerals of each chord in integer scale degrees.orig_prog(str): Thebase_progstring prefixed by thescale_type. This results in a string denoting the chord progression and the mode it occurs in. For example, "i7-VI(maj7)-ii(halfdim7)-V7" in minor ismin-1625.inv(int): The inversion (0-indexed) that each chord is in. As each chord is in root position for all musical examples, this is 0 for all.sub_prog(str): Theorig_progstring suffixed by thescale_sub_type. For example, a tritone substitution for "i7-VI(maj7)-ii(halfdim7)-V7" in minor ismin-1625-Tsub_pos(int): The position in the chord progression that a chord substitution occurs (1-indexed). For example, each chord progression that has a substitution in this sub-dataset substitutes the 2nd out of 4 chords and thus have the value 2. A chord progression where no substitution occurs (Ntypes) have the value -1.bpm(int): The BPM a musical example is realized at. For all examples, this is 60 BPM.fold(int): The fold the musical example belongs in (1-indexed). The musical example can be found in the sub-dataset'sfold_{fold}subfolder and there are 20 folds.set_type(str): Denotes if the musical example's fold is meant for training (train), testing (test), or validation (valid). For the original paper experiments, the first 14 folds are assigned totrain, the next three tovalid, and the next three totest.
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
đĄī¸ Dataset Transparency Report
Technical metadata sourced from upstream repositories.
đ Identity & Source
- id
- hf-dataset--derekxkwan--syntheory_plus
- slug
- derekxkwan--syntheory_plus
- source
- huggingface
- author
- derekxkwan
- license
- MIT
- tags
- license:mit, size_categories:100k<n<1m, region:us, music
âī¸ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
- pipeline tag
đ Engagement & Metrics
- downloads
- 123,557
- stars
- 0
- forks
- 0
Data indexed from public sources. Updated daily.