📊
Dataset

Voiceassistant 400k

by shenyunhang hf-dataset--shenyunhang--voiceassistant-400k
Nexus Index
28.0 Top 1%
S / A / P / R / Q Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context
Vital Performance
0 DL / 30D
0.0%

--- Data source: https://huggingface.co/datasets/gpt-omni/VoiceAssistant-400K 1. Question and answer audios are extracted, which results in audio files. 2. Raw conversations are formed as multi-round chat format in , which has in total of samples. ---

Data Integrity 28 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--shenyunhang--voiceassistant-400k
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__shenyunhang__voiceassistant_400k,
  author = {shenyunhang},
  title = {Voiceassistant 400k Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/shenyunhang/VoiceAssistant-400K}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
shenyunhang. (2026). Voiceassistant 400k [Dataset]. Free2AITools. https://huggingface.co/datasets/shenyunhang/VoiceAssistant-400K

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

âš–ī¸ Nexus Index V2.0

28.0
ESTIMATED IMPACT TIER
Semantic (S) 50
Authority (A) 0
Popularity (P) 0
Recency (R) 0
Quality (Q) 0

đŸ’Ŧ Index Insight

FNI V2.0 for Voiceassistant 400k: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
âŦ‡ī¸
Downloads
283,794
â¤ī¸
Likes
2

đŸ‘ī¸ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

đŸ§Ŧ Field Logic

đŸ§Ŧ

Schema not yet indexed for this dataset.

Dataset Specification


license: apache-2.0


Data source:
https://huggingface.co/datasets/gpt-omni/VoiceAssistant-400K

  1. Question and answer audios are extracted, which results in 940,108 audio files.
  2. Raw conversations are formed as multi-round chat format in data.jsonl, which has in total of 251,223 samples.
[
    ...
    {
        "messages": [
            {
                "role": "user",
                "content": "...<|audio|>"
            },
            {
                "role": "assistant",
                "content": "...<|audio|>"
            }
            {
                "role": "user",
                "content": "...<|audio|>"
            },
            {
                "role": "assistant",
                "content": "...<|audio|>"
            },
            ...
        ],
        "audios": ["path/to/first/audio", "path/to/second/audio", "path/to/third/audio", "path/to/forth/audio", ...],
    },
    ...
]

Top Tier

Social Proof

HuggingFace Hub
2Likes
283.8KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id
hf-dataset--shenyunhang--voiceassistant-400k
source
huggingface
author
shenyunhang
tags
license:apache-2.0region:us

âš™ī¸ Technical Specs

architecture
null
params billions
null
context length
409,600

📊 Engagement & Metrics

likes
2
downloads
283,794

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)