📊

Dataset

Musdb18 Processed

by Cs229 Audio Ml Project hf-dataset--cs229-audio-ml-project--musdb18-processed

Nexus Index

23.0 Top 2%

P / V / C / U Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context

Vital Performance

0 DL / 30D

0.0%

This dataset contains active stem segments extracted from the MUSDB18 dataset for the Stanford CS229 Machine Learning course project on audio source separation. This is a processed version of the MUSDB18 dataset containing only the active s...

Source →

Data Integrity 23 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--cs229-audio-ml-project--musdb18-processed
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__cs229_audio_ml_project__musdb18_processed,
  author = {Cs229 Audio Ml Project},
  title = {Musdb18 Processed Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/cs229-audio-ml-project/musdb18-processed}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

Cs229 Audio Ml Project. (2026). Musdb18 Processed [Dataset]. Free2AITools. https://huggingface.co/datasets/cs229-audio-ml-project/musdb18-processed

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V16.5

Methodology Index Protocol

23.0

ESTIMATED IMPACT TIER

Popularity (P) 0

Freshness (F) 0

Completeness (C) 0

Utility (U) 0

💬 Index Insight

The Free2AITools Nexus Index for Musdb18 Processed aggregates Popularity (P:0), Freshness (F:0), and Completeness (C:0). The Utility score (U:0) represents deployment readiness and ecosystem adoption.

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

42,925

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

license: mit
task_categories:

audio-classification
audio-to-audio
tags:
audio
music
source-separation
musdb18
stems
active-segments
cs229
stanford
pretty_name: "MUSDB18 Active Stems - CS229 Project"
size_categories:
10K<n<100K

MUSDB18 Active Stems Dataset - CS229 Project

This dataset contains active stem segments extracted from the MUSDB18 dataset for the Stanford CS229 Machine Learning course project on audio source separation.

Dataset Description

This is a processed version of the MUSDB18 dataset containing only the active segments of each stem (drums, bass, vocals, accompaniment, and mixture), designed to improve training efficiency for music source separation models.

Key Features

Active Segment Detection: Only segments where stems have significant energy
5 Stems: mixture, drums, bass, vocals, accompaniment
Consistent Format: 22.05 kHz sample rate, mono audio
Rich Metadata: Detailed segment information and statistics

CS229 Project Context

This dataset was created as part of a Stanford CS229 course project focusing on:

Music source separation using deep learning
Comparison of different neural architectures (Conv-TasNet, etc.)
Analysis of active vs. inactive audio segments in training

Dataset Structure

extracted_stems/
├── train/              # Training split
│   ├── drums/         # Active drum segments
│   ├── bass/          # Active bass segments  
│   ├── vocals/        # Active vocal segments
│   ├── accompaniment/ # Active accompaniment segments
│   └── mixture/       # Active mixture segments
├── test/              # Test split (same structure)
└── metadata/          # JSON metadata files

Quick Start

Loading with datasets library

from datasets import load_dataset

Load the full dataset
dataset = load_dataset("cs229-audio-ml-project/musdb18-processed")
Access training datatrain_data = dataset["train"]
for item in train_data:
    audio = item["audio"]["array"]
    stem_type = item["stem_type"]
    track_name = item["track_name"]

Manual loading

import soundfile as sf
import json

Load an audio segment
audio, sr = sf.read("train/vocals/track_vocals_001.wav")
Load metadatawith open("metadata/train_metadata.json") as f:
    metadata = json.load(f)

Extraction Parameters

Segment Length: 4.0 seconds
Hop Length: 2.0 seconds (50% overlap)
Energy Threshold: 0.01 RMS
Sample Rate: 22,050 Hz
Minimum Duration: 1.0 seconds

Citation

If you use this dataset in your research, please cite:

@dataset{cs229_musdb18_active_stems,
  title={MUSDB18 Active Stems Dataset},
  author={CS229 Audio ML Project Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/datasets/cs229-audio-ml-project/musdb18-processed}
}

Original MUSDB18 citation:

@misc{musdb18,
  author       = {Rafii, Zafar and Liutkus, Antoine and Stöter, Fabian-Robert and Mimilakis, Stylianos Ioannis and Bittner, Rachel},
  title        = {MUSDB18-HQ - an uncompressed version of MUSDB18},
  month        = {December},
  year         = {2019},
  doi          = {10.5281/zenodo.3338373},
  url          = {https://doi.org/10.5281/zenodo.3338373}
}

Contact

For questions about this dataset or the CS229 project, please open an issue in this repository.

Created for Stanford CS229 - Machine Learning Course Project

Top Tier

Social Proof

HuggingFace Hub

42.9KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id: hf-dataset--cs229-audio-ml-project--musdb18-processed
source: huggingface
author: Cs229 Audio Ml Project
tags: task_categories:audio-classificationtask_categories:audio-to-audiolicense:mitsize_categories:10kmodality:audioregion:usaudiomusicsource-separationmusdb18stemsactive-segmentscs229stanford

⚙️ Technical Specs

architecture: null
params billions: null
context length: null

📊 Engagement & Metrics

likes: 0
downloads: 42,925

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!