πŸ“Š
Dataset

TurkmenSpeech

by mamed0v hf-dataset--mamed0v--turkmenspeech
Nexus Index
38.0 Top 0%
P / V / C / U Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context
Vital Performance
0 DL / 30D
0.0%

--- pretty_name: Turkmen Speech Dataset language: - tk task_categories: - automatic-speech-recognition - text-to-speech size_categories: - 100K

Data Integrity 38 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--mamed0v--turkmenspeech
Provider huggingface
πŸ“œ

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__mamed0v__turkmenspeech,
  author = {mamed0v},
  title = {TurkmenSpeech Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/mamed0v/TurkmenSpeech}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
mamed0v. (2026). TurkmenSpeech [Dataset]. Free2AITools. https://huggingface.co/datasets/mamed0v/TurkmenSpeech

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Nexus Index V16.5

38.0
ESTIMATED IMPACT TIER
Popularity (P) 0
Freshness (F) 0
Completeness (C) 0
Utility (U) 0

πŸ’¬ Index Insight

The Free2AITools Nexus Index for TurkmenSpeech aggregates Popularity (P:0), Freshness (F:0), and Completeness (C:0). The Utility score (U:0) represents deployment readiness and ecosystem adoption.

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
⬇️
Downloads
12,877
❀️
Likes
3

πŸ‘οΈ Data Preview

πŸ“Š

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

πŸ”— Explore Full Dataset β†—

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification


pretty_name: Turkmen Speech Dataset
language:

  • tk
    license: cc-by-nc-4.0
    task_categories:
  • automatic-speech-recognition
  • text-to-speech
    size_categories:
  • 100K<n<1M
    dataset_info:
    features:
    • name: audio
      dtype: audio
    • name: text
      dtype: string
    • name: duration
      dtype: float32
    • name: start
      dtype: float32
    • name: end
      dtype: float32
    • name: source
      dtype: string
    • name: speaker_id
      dtype: string
    • name: id
      dtype: string
    • name: language
      dtype: string
    • name: split
      dtype: string
    • name: sampling_rate
      dtype: int32
      splits:
    • name: train
      num_examples: 119853
      dataset_size: 10043004.0
      configs:
  • config_name: default
    data_files:
    • split: train
      path: data/metadata.csv

Turkmen Speech Dataset (ASR)

This dataset contains 251 hours of Turkmen speech audio with transcriptions, intended for training and evaluating Automatic Speech Recognition (ASR) models.
It is one of the largest publicly available Turkmen speech datasets.

Dataset Overview

Property Value
Total clips 119,847
Total duration 251.86 hours
Sampling rate 16,000 Hz
Language Turkmen (tk)
Split train

Each item includes:

  • audio: waveform + sampling rate
  • text: Turkmen transcription
  • duration: clip length in seconds
  • start/end timestamps from the original source
  • source: YouTube link
  • speaker_id: currently unknown

Quality Notes

  • 95% of clips are under 20 seconds, suitable for ASR model training
  • Contains real-world spoken Turkmen (news, interviews, dialogues, narration)
  • Correct diacritic usage
  • No empty or silent clips

License

This dataset is released under the Creative Commons Attribution–NonCommercial 4.0 International (CC BY-NC 4.0) license.

You are free to:

  • Use, share, and modify the dataset for research and non-commercial purposes only

You must:

  • Provide proper attribution
  • Not use the dataset or derivative works for commercial purposes

Full license text: https://creativecommons.org/licenses/by-nc/4.0/

Top Tier

Social Proof

HuggingFace Hub
3Likes
12.9KDownloads
πŸ”„ Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

πŸ†” Identity & Source

id
hf-dataset--mamed0v--turkmenspeech
source
huggingface
author
mamed0v
tags
task_categories:automatic-speech-recognitiontask_categories:text-to-speechlanguage:tklicense:cc-by-nc-4.0size_categories:100kregion:us

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null

πŸ“Š Engagement & Metrics

likes
3
downloads
12,877

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)