📊

Dataset

TurkmenSpeech

by mamed0v hf-dataset--mamed0v--turkmenspeech

Nexus Index

38.0 Top 0%

P / V / C / U Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context

Vital Performance

0 DL / 30D

0.0%

--- pretty_name: Turkmen Speech Dataset language: - tk task_categories: - automatic-speech-recognition - text-to-speech size_categories: - 100K

Source →

Data Integrity 38 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--mamed0v--turkmenspeech
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__mamed0v__turkmenspeech,
  author = {mamed0v},
  title = {TurkmenSpeech Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/mamed0v/TurkmenSpeech}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

mamed0v. (2026). TurkmenSpeech [Dataset]. Free2AITools. https://huggingface.co/datasets/mamed0v/TurkmenSpeech

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V16.5

Methodology Index Protocol

38.0

ESTIMATED IMPACT TIER

Popularity (P) 0

Freshness (F) 0

Completeness (C) 0

Utility (U) 0

💬 Index Insight

The Free2AITools Nexus Index for TurkmenSpeech aggregates Popularity (P:0), Freshness (F:0), and Completeness (C:0). The Utility score (U:0) represents deployment readiness and ecosystem adoption.

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

12,877

❤️

Likes

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

pretty_name: Turkmen Speech Dataset
language:

tk
license: cc-by-nc-4.0
task_categories:
automatic-speech-recognition
text-to-speech
size_categories:
100K<n<1M
dataset_info:
features:
- name: audio
  dtype: audio
- name: text
  dtype: string
- name: duration
  dtype: float32
- name: start
  dtype: float32
- name: end
  dtype: float32
- name: source
  dtype: string
- name: speaker_id
  dtype: string
- name: id
  dtype: string
- name: language
  dtype: string
- name: split
  dtype: string
- name: sampling_rate
  dtype: int32
  splits:
- name: train
  num_examples: 119853
  dataset_size: 10043004.0
  configs:
config_name: default
data_files:
- split: train
  path: data/metadata.csv

Turkmen Speech Dataset (ASR)

This dataset contains 251 hours of Turkmen speech audio with transcriptions, intended for training and evaluating Automatic Speech Recognition (ASR) models.
It is one of the largest publicly available Turkmen speech datasets.

Dataset Overview

Property	Value
Total clips	119,847
Total duration	251.86 hours
Sampling rate	16,000 Hz
Language	Turkmen (`tk`)
Split	`train`

Each item includes:

audio: waveform + sampling rate
text: Turkmen transcription
duration: clip length in seconds
start/end timestamps from the original source
source: YouTube link
speaker_id: currently unknown

Quality Notes

95% of clips are under 20 seconds, suitable for ASR model training
Contains real-world spoken Turkmen (news, interviews, dialogues, narration)
Correct diacritic usage
No empty or silent clips

License

This dataset is released under the Creative Commons Attribution–NonCommercial 4.0 International (CC BY-NC 4.0) license.

You are free to:

Use, share, and modify the dataset for research and non-commercial purposes only

You must:

Provide proper attribution
Not use the dataset or derivative works for commercial purposes

Full license text: https://creativecommons.org/licenses/by-nc/4.0/

Top Tier

Social Proof

HuggingFace Hub

3Likes

12.9KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id: hf-dataset--mamed0v--turkmenspeech
source: huggingface
author: mamed0v
tags: task_categories:automatic-speech-recognitiontask_categories:text-to-speechlanguage:tklicense:cc-by-nc-4.0size_categories:100kregion:us

⚙️ Technical Specs

architecture: null
params billions: null
context length: null

📊 Engagement & Metrics

likes: 3
downloads: 12,877

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!