🧠

Model

Symphony Asr

Name: Symphony Asr
Author: Okestro Ai Lab

by Okestro Ai Lab hf-model--okestro-ai-lab--symphony-asr

Nexus Index

38.9 Top 6%

S: Semantic 50

A: Authority 0

P: Popularity 12

R: Recency 100

Q: Quality 65

Tech Context

4.75 Params

4.096K Ctx

Vital Performance

174 DL / 30D

0.0%

Source →

Audited 38.9 FNI Score

4.75B Params

4k Context

174 Downloads

8G GPU ~5GB Est. VRAM

Dense SYMPHONYFORCONDITIONALGENERATION Architecture

Commercial APACHE License

Model Information Summary
Entity Passport
Registry ID	hf-model--okestro-ai-lab--symphony-asr
License	Apache-2.0
Provider	huggingface

💾

Compute Threshold

~4.9GB VRAM

Interactive

Analyze Hardware

Hardware Compatibility Test

▼

* Static estimation for 4-Bit Quantization.

📜

Cite this model

Academic & Research Attribution

BibTeX

@misc{hf_model__okestro_ai_lab__symphony_asr,
  author = {Okestro Ai Lab},
  title = {Symphony Asr Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/okestro-ai-lab/SYMPHONY-ASR}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

Okestro Ai Lab. (2026). Symphony Asr [Model]. Free2AITools. https://huggingface.co/okestro-ai-lab/SYMPHONY-ASR

🔬Technical Deep Dive

Full Specifications [+]

Quick Commands

🦙 Ollama Run

ollama run symphony-asr

🤗 HF Download

huggingface-cli download okestro-ai-lab/symphony-asr

📦 Install Lib

pip install -U transformers

⚖️ Nexus Index V2.0

Methodology Index Protocol

38.9

TOP 6% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 12

Recency (R) 100

Quality (Q) 65

💬 Index Insight

FNI V2.0 for Symphony Asr: Semantic (S:50), Authority (A:0), Popularity (P:12), Recency (R:100), Quality (Q:65).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

---

🚀 What's Next?

📊

Find Training Datasets

Discover datasets compatible with this model

📈

Compare Benchmarks

See how this model ranks on standard tests

⚡

Deployment Guide

Understand deployment options

Technical Deep Dive

SYMPHONY-ASR is an Automatic Speech Recognition (ASR)-specialized model designed for efficient and accurate speech-to-text transcription.

📖 Model Architecture

🚀 SYMPHONY-ASR is an Automatic Speech Recognition (ASR)-specialized model designed for efficient speech-to-text transcription.

📌 Key Features (Safe Version)

⚡ Efficient long-form speech processing
🧠 Adaptation of pre-trained LLMs to audio
✅ Evaluation results on standard ASR benchmarks (WERs listed above)

🚀 Getting Started

1. Installation

First, install the required libraries.

bash

sudo apt install ffmpeg
# pip
torch==2.3.1
peft==0.14.0
librosa==0.11.0
transformers==4.53.1
accelerate==0.34.2
einops==0.8.1
torchaudio==2.3.1
openai-whisper
soundfile

2. Load Model and Tokenizer

You can easily load the model using AutoModelForCausalLM.from_pretrained. This model includes custom code, so the trust_remote_code=True option is required.

python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

# ⬅️ Enter your Hugging Face repository ID here.
repo_id = "okestro-ai-lab/SYMPHONY-ASR" 

model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
generation_config = GenerationConfig.from_pretrained(repo_id)

model.eval()

3. Sample Inference (Automatic Speech Recognition)

This example shows how to load an audio file and transcribe it to text.

python

import torch
import librosa

# 1. Load and resample the audio file
# ⬅️ Path to the audio file to be transcribed
wav_path = "sample_audio/English_audio.wav" 
wav,sample_rate  = librosa.load(wav_path)

# SYMPHONY-ASR requires 16kHz audio.
if sample_rate != 16000:
    audio = librosa.resample(wav,orig_sr=sample_rate,target_sr=16000)
else:
    audio = wav

# 2. Prepare the prompt and tokenize the prompt
# Automatic Speech Recognition (ASR) task
# Addiational Tasks: please refer to Supported Tasks
# A task token is not required, but it is recommended for achieving a more appropriate task.
TASK_TOKEN = "<|ASR|>" 
AUDIO_TOKEN = "<|audio_bos|><|AUDIO|><|audio_eos|>"
user_prompt = f"{TASK_TOKEN}{AUDIO_TOKEN}\nTranscribe the audio clip into text."

prompt = [{"role": "user", "content": user_prompt}]
input_ids = tokenizer.apply_chat_template(
    prompt,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors='pt'
).to(model.device)

# 3. Perform inference
# The model's generate function expects the audio input as a list.
audio_tensor = torch.tensor((audio,),dtype=torch.float32).cuda()

with torch.no_grad():
    with torch.cuda.amp.autocast(dtype=torch.bfloat16):
        output_ids = model.generate(
            input_ids=input_ids,
            audio=audio_tensor,
            generation_config=generation_config,
            max_new_tokens=256
        )

# 5. Decode the result
transcription = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]

print("--- Transcription Result ---")
print(transcription)

📌 Supported Tasks

You can perform different tasks by using the following special tokens in your prompt:

<|ASR|>: Automatic Speech Recognition - Transcribes audio into text.
<|AST|>: Automatic Speech Translation - Translates audio into text of another language.
<|SSUM|>: Speech Summarization - Summarizes the content of an audio clip.
<|SQQA|>: Spoken Query-based Question Answering - Answers questions based on the content of an audio clip.

⚡ GPU Requirements

SYMPHONY-ASR inference requires a GPU with sufficient memory.

Task	Recommended GPU	Minimum VRAM
Inference	NVIDIA A100 / H100	≥ 11.8 GB

💡 Using mixed precision (bfloat16 or fp16) is recommended to reduce memory usage.

⚠️ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

• Benchmark scores may vary based on evaluation methodology and hardware configuration.
• VRAM requirements are estimates; actual usage depends on quantization and batch size.
• FNI scores are relative rankings and may change as new models are added.
⚠ License Unknown: Verify licensing terms before commercial use.

Top Tier

Social Proof

HuggingFace Hub

174Downloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-model--okestro-ai-lab--symphony-asr
slug: okestro-ai-lab--symphony-asr
source: huggingface
author: Okestro Ai Lab
license: Apache-2.0
tags: transformers, safetensors, symphony, feature-extraction, audio, text-generation, audio-text-to-text, custom_code, en, ko, base_model:qwen/qwen3-4b, base_model:finetune:qwen/qwen3-4b, license:apache-2.0, model-index, region:us, eval-results

⚙️ Technical Specs

architecture: SymphonyForConditionalGeneration
params billions: 4.75
context length: 4,096
pipeline tag: audio-text-to-text
vram gb: 4.9
vram is estimated: true
vram formula: VRAM ≈ (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

downloads: 174
stars: 0
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!