πŸ“Š
Dataset

Musdb18 Processed

by Cs229 Audio Ml Project hf-dataset--cs229-audio-ml-project--musdb18-processed
Nexus Index
23.0 Top 2%
P / V / C / U Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context
Vital Performance
0 DL / 30D
0.0%

This dataset contains active stem segments extracted from the MUSDB18 dataset for the Stanford CS229 Machine Learning course project on audio source separation. This is a processed version of the MUSDB18 dataset containing only the active s...

Data Integrity 23 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--cs229-audio-ml-project--musdb18-processed
Provider huggingface
πŸ“œ

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__cs229_audio_ml_project__musdb18_processed,
  author = {Cs229 Audio Ml Project},
  title = {Musdb18 Processed Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/cs229-audio-ml-project/musdb18-processed}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
Cs229 Audio Ml Project. (2026). Musdb18 Processed [Dataset]. Free2AITools. https://huggingface.co/datasets/cs229-audio-ml-project/musdb18-processed

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Nexus Index V16.5

23.0
ESTIMATED IMPACT TIER
Popularity (P) 0
Freshness (F) 0
Completeness (C) 0
Utility (U) 0

πŸ’¬ Index Insight

The Free2AITools Nexus Index for Musdb18 Processed aggregates Popularity (P:0), Freshness (F:0), and Completeness (C:0). The Utility score (U:0) represents deployment readiness and ecosystem adoption.

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
⬇️
Downloads
42,925

πŸ‘οΈ Data Preview

πŸ“Š

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

πŸ”— Explore Full Dataset β†—

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification


license: mit
task_categories:

  • audio-classification
  • audio-to-audio
    tags:
  • audio
  • music
  • source-separation
  • musdb18
  • stems
  • active-segments
  • cs229
  • stanford
    pretty_name: "MUSDB18 Active Stems - CS229 Project"
    size_categories:
  • 10K<n<100K

MUSDB18 Active Stems Dataset - CS229 Project

This dataset contains active stem segments extracted from the MUSDB18 dataset for the Stanford CS229 Machine Learning course project on audio source separation.

Dataset Description

This is a processed version of the MUSDB18 dataset containing only the active segments of each stem (drums, bass, vocals, accompaniment, and mixture), designed to improve training efficiency for music source separation models.

Key Features

  • Active Segment Detection: Only segments where stems have significant energy
  • 5 Stems: mixture, drums, bass, vocals, accompaniment
  • Consistent Format: 22.05 kHz sample rate, mono audio
  • Rich Metadata: Detailed segment information and statistics

CS229 Project Context

This dataset was created as part of a Stanford CS229 course project focusing on:

  • Music source separation using deep learning
  • Comparison of different neural architectures (Conv-TasNet, etc.)
  • Analysis of active vs. inactive audio segments in training

Dataset Structure

extracted_stems/
β”œβ”€β”€ train/              # Training split
β”‚   β”œβ”€β”€ drums/         # Active drum segments
β”‚   β”œβ”€β”€ bass/          # Active bass segments  
β”‚   β”œβ”€β”€ vocals/        # Active vocal segments
β”‚   β”œβ”€β”€ accompaniment/ # Active accompaniment segments
β”‚   └── mixture/       # Active mixture segments
β”œβ”€β”€ test/              # Test split (same structure)
└── metadata/          # JSON metadata files

Quick Start

Loading with datasets library

from datasets import load_dataset

Load the full dataset

dataset = load_dataset("cs229-audio-ml-project/musdb18-processed")

Access training data

train_data = dataset["train"] for item in train_data: audio = item["audio"]["array"] stem_type = item["stem_type"] track_name = item["track_name"]

Manual loading

import soundfile as sf
import json

Load an audio segment

audio, sr = sf.read("train/vocals/track_vocals_001.wav")

Load metadata

with open("metadata/train_metadata.json") as f: metadata = json.load(f)

Extraction Parameters

  • Segment Length: 4.0 seconds
  • Hop Length: 2.0 seconds (50% overlap)
  • Energy Threshold: 0.01 RMS
  • Sample Rate: 22,050 Hz
  • Minimum Duration: 1.0 seconds

Citation

If you use this dataset in your research, please cite:

@dataset{cs229_musdb18_active_stems,
  title={MUSDB18 Active Stems Dataset},
  author={CS229 Audio ML Project Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/datasets/cs229-audio-ml-project/musdb18-processed}
}

Original MUSDB18 citation:

@misc{musdb18,
  author       = {Rafii, Zafar and Liutkus, Antoine and StΓΆter, Fabian-Robert and Mimilakis, Stylianos Ioannis and Bittner, Rachel},
  title        = {MUSDB18-HQ - an uncompressed version of MUSDB18},
  month        = {December},
  year         = {2019},
  doi          = {10.5281/zenodo.3338373},
  url          = {https://doi.org/10.5281/zenodo.3338373}
}

Contact

For questions about this dataset or the CS229 project, please open an issue in this repository.

Created for Stanford CS229 - Machine Learning Course Project

Top Tier

Social Proof

HuggingFace Hub
42.9KDownloads
πŸ”„ Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

πŸ†” Identity & Source

id
hf-dataset--cs229-audio-ml-project--musdb18-processed
source
huggingface
author
Cs229 Audio Ml Project
tags
task_categories:audio-classificationtask_categories:audio-to-audiolicense:mitsize_categories:10kmodality:audioregion:usaudiomusicsource-separationmusdb18stemsactive-segmentscs229stanford

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null

πŸ“Š Engagement & Metrics

likes
0
downloads
42,925

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)