GTSinger
Pillar scores are computed during the next indexing cycle.
Dataset of GTSinger (NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks. We introduce GTSinger, a large Global, multi-Technique, free-to-use, high-...
| Entity Passport | |
| Registry ID | hf-dataset--aaronz345--gtsinger |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__aaronz345__gtsinger,
author = {AaronZ345},
title = {GTSinger Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/AaronZ345/GTSinger}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
âī¸ Nexus Index V2.0
đŦ Index Insight
FNI V2.0 for GTSinger: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).
Verification Authority
đī¸ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
đ Explore Full Dataset âđ§Ŧ Field Logic
Schema not yet indexed for this dataset.
Dataset Specification
license: cc-by-nc-sa-4.0
task_categories:
- text-to-audio
- text-to-speech
language: - zh
- en
- fr
- ja
- ko
- es
- de
- ru
- it
tags: - singing
- audio
- croissant
pretty_name: a
size_categories: - 1B<n<10B
configs: - config_name: meta
data_files: processed/All/metadata.json
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Yu Zhang*, Changhao Pan*, Wenxiang Guo*, Ruiqi Li, Zhiyuan Zhu, Jialei Wang, Wenhao Xu, Jingyu Lu, Zhiqing Hong, Chuxin Wang, LiChao Zhang, Jinzheng He, Ziyue Jiang, Yuxin Chen, Chen Yang, Jiecheng Zhou, Xinyu Cheng, Zhou Zhao | Zhejiang University
Dataset of GTSinger (NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks.
We introduce GTSinger, a large Global, multi-Technique, free-to-use, high-quality singing corpus with realistic music scores, designed for all singing tasks, along with its benchmarks.
We provide the full corpus for free in this repository.
And metadata.json and phone_set.json are also offered for each language in processed. Note: you should change the wav_fn for each segment to your own absolute path! And you can use metadata of multiple languages by concat their data! We will provide the metadata for other languages soon!
Besides, we also provide our dataset on Google Drive.
Moreover, you can visit our Demo Page for the audio samples of our dataset as well as the results of our benchmarks.
Updates
- 2025.02: We released all processed data of GTSinger and refined 7/9 languages!
- 2024.10: We refine the paired speech data of each language!
- 2024.10: We released the processed data of Chinese, English, Spanish, German, Russian!
- 2024.09: We released the full dataset of GTSinger!
- 2024.09: GTSinger is accepted by NeurIPS 2024 (Spotlight)!
Key Features
- 80.59 hours of singing voices in GTSinger are recorded in professional studios by skilled singers, ensuring high quality and clarity, forming the largest recorded singing dataset.
- Contributed by 20 singers across nine widely spoken languages (Chinese, English, Japanese, Korean, Russian, Spanish, French, German, and Italian) and all four vocal ranges, GTSinger enables zero-shot SVS and style transfer models to learn diverse timbres and styles.
- GTSinger provides controlled comparison and phoneme-level annotations of six singing techniques (mixed voice, falsetto, breathy, pharyngeal, vibrato, and glissando) for songs, thereby facilitating singing technique modeling, recognition, and control.
- Unlike fine-grained music scores, GTSinger features realistic music scores with regular note duration, assisting singing models in learning and adapting to real-world musical composition.
- The dataset includes manual phoneme-to-audio alignments, global style labels (singing method, emotion, range, and pace), and 16.16 hours of paired speech, ensuring comprehensive annotations and broad task suitability.
Dataset
Where to download
Through this repo you can access our full dataset (audio along with TextGrid, json, musicxml) and processed data (metadata.json, phone_set.json, spker_set.json) on Hugging Face for free! Hope our data is helpful for your research.
Besides, we also provide our dataset on .
Please note that, if you are using GTSinger, it means that you have accepted the terms of license.
Data Architecture
Our dataset is organized hierarchically.
It presents nine top-level folders, each corresponding to a distinct language.
Within each language folder, there are five sub-folders, each representing a specific singing technique.
These technique folders contain numerous song entries, with each song further divided into several controlled comparison groups: a control group (natural singing without the specific technique), and a technique group (densely employing the specific technique).
Our singing voices and speech are recorded at a 48kHz sampling rate with 24-bit resolution in WAV format.
Alignments and annotations are provided in TextGrid files, including word boundaries, phoneme boundaries, phoneme-level annotations for six techniques, and global style labels (singing method, emotion, pace, and range).
We also provide realistic music scores in musicxml format.
Notably, we provide an additional JSON file for each singing voice, facilitating data parsing and processing for singing models.
Here is the data structure of our dataset:
.
âââ Chinese
â  âââ ZH-Alto-1
â  âââ ZH-Tenor-1
âââ English
â  âââ EN-Alto-1
â  â  âââ Breathy
â  â  âââ Glissando
â  â  â âââ my love
â  â  â âââ Control_Group
â  â  â âââ Glissando_Group
â  â  â âââ Paired_Speech_Group
â  â  âââ Mixed_Voice_and_Falsetto
â  â  âââ Pharyngeal
â  â  âââ Vibrato
â  âââ EN-Alto-2
â  â  âââ Breathy
â  â  âââ Glissando
â  â  âââ Mixed_Voice_and_Falsetto
â  â  âââ Pharyngeal
â  â  âââ Vibrato
â  âââ EN-Tenor-1
â    âââ Breathy
â    âââ Glissando
â    âââ Mixed_Voice_and_Falsetto
â    âââ Pharyngeal
â    âââ Vibrato
âââ French
â  âââ FR-Soprano-1
â  âââ FR-Tenor-1
âââ German
â  âââ DE-Soprano-1
â  âââ DE-Tenor-1
âââ Italian
â  âââ IT-Bass-1
â  âââ IT-Bass-2
â  âââ IT-Soprano-1
âââ Japanese
â  âââ JA-Soprano-1
â  âââ JA-Tenor-1
âââ Korean
â  âââ KO-Soprano-1
â  âââ KO-Soprano-2
â  âââ KO-Tenor-1
âââ Russian
â  âââ RU-Alto-1
âââ Spanish
âââ ES-Bass-1
âââ ES-Soprano-1
Citations
If you find this code useful in your research, please cite our work:
@article{zhang2024gtsinger,
title={Gtsinger: A global multi-technique singing corpus with realistic music scores for all singing tasks},
author={Zhang, Yu and Pan, Changhao and Guo, Wenxiang and Li, Ruiqi and Zhu, Zhiyuan and Wang, Jialei and Xu, Wenhao and Lu, Jingyu and Hong, Zhiqing and Wang, Chuxin and others},
journal={arXiv preprint arXiv:2409.13832},
year={2024}
}
Disclaimer
Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's singing without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
đĄī¸ Dataset Transparency Report
Verified data manifest for traceability and transparency.
đ Identity & Source
- id
- hf-dataset--aaronz345--gtsinger
- source
- huggingface
- author
- AaronZ345
- tags
- task_categories:text-to-audiotask_categories:text-to-speechlanguage:zhlanguage:enlanguage:frlanguage:jalanguage:kolanguage:eslanguage:delanguage:rulanguage:itlicense:cc-by-nc-sa-4.0size_categories:10k
format:jsonmodality:audiomodality:textlibrary:datasetslibrary:pandaslibrary:mlcroissantlibrary:polarsarxiv:2409.13832doi:10.57967/hf/5398region:ussingingaudiocroissant
âī¸ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
đ Engagement & Metrics
- likes
- 12
- downloads
- 107,985
Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)