🧠

Model

Qwen3 0.6b Edgerazor 1.58bit

Name: Qwen3 0.6b Edgerazor 1.58bit
Author: Zhangsq Nju

by Zhangsq Nju hf-model--zhangsq-nju--qwen3-0.6b-edgerazor-1.58bit

Free2AITools Nexus Index

37.0 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 0

R: Recency 97

Q: Quality 65

Tech Context

0.6B Params

32.768K Ctx

Vital Performance

417 DL / 30D

0.0%

Source →

Audited 37 FNI Score

Tiny 0.6B Params

32k Context

417 Downloads

8G GPU ~3GB Est. VRAM

Dense QWEN3FORCAUSALLM Architecture

Commercial APACHE License

Model Information Summary
Entity Passport
Registry ID	hf-model--zhangsq-nju--qwen3-0.6b-edgerazor-1.58bit
License	Apache-2.0
Provider	huggingface

💾

Compute Threshold

~3GB VRAM

Interactive

Analyze Hardware

Hardware Compatibility Test

▼

* Static estimation for 4-Bit Quantization.

📜

Cite this model

Academic & Research Attribution

BibTeX

@misc{hf_model__zhangsq_nju__qwen3_0.6b_edgerazor_1.58bit,
  author = {Zhangsq Nju},
  title = {Qwen3 0.6b Edgerazor 1.58bit Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/zhangsq-nju/Qwen3-0.6B-EdgeRazor-1.58bit}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

Zhangsq Nju. (2026). Qwen3 0.6b Edgerazor 1.58bit [Model]. Free2AITools. https://huggingface.co/zhangsq-nju/Qwen3-0.6B-EdgeRazor-1.58bit

🔬Technical Deep Dive

Full Specifications [+]

Quick Commands

🦙 Ollama Run

ollama run qwen3-0.6b-edgerazor-1.58bit

🤗 HF Download

huggingface-cli download zhangsq-nju/qwen3-0.6b-edgerazor-1.58bit

📦 Install Lib

pip install -U transformers

⚖️ Free2AITools Nexus Index V2.0

Methodology Index Protocol

Semantic (S) 50

Authority (A) 0

Popularity (P) 0

Recency (R) 97

Quality (Q) 65

💬 Index Insight

FNI V2.0 for Qwen3 0.6b Edgerazor 1.58bit: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:97), Quality (Q:65).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

---

🚀 What's Next?

📊

Find Training Datasets

Discover datasets compatible with this model

📈

Compare Benchmarks

See how this model ranks on standard tests

⚡

Technical Deep Dive

EdgeRazor for Lightweight LLMs

Qwen3-0.6B-EdgeRazor-1.58bit

Contents
Model Overview
Model Bit-Widths
Model Performance
Quickstart
Citation

Model Overview

Base Model: Qwen/Qwen3-0.6B
Training: zhangsq-nju/EdgeRazor
Quantization: 1.58-bit for all decoder layers; 4-bit for embedding and lm_head

Model Bit-Widths

Mixed-Precision Recipe	Bit-Width	This Repo
100% 4-bit + 0% 1.58-bit	4
50% 4-bit + 50% 1.58-bit	2.79
12.5% 4-bit + 87.5% 1.58-bit	1.88
0% 4-bit + 100% 1.58-bit	1.58	✔️

Model Performance

Models	W-A-KV	ARC-e	ARC-c	HellaS.	BoolQ	PIQA	WinoG.	SIQA	OBQA	Tr.QA2	Ethics	MMLU	IFEval	GSM8K	HumanE.	Average (↑)
Qwen3-0.6B	16-16-16	56.02	34.04	47.23	64.04	67.36	56.04	39.20	31.20	42.84	47.70	40.12	58.41	41.54	37.20	47.35
EdgeRazor	4-16-16	58.54	33.45	45.04	68.01	68.34	55.72	40.07	33.40	43.69	54.36	39.37	53.42	42.00	34.15	47.83
EdgeRazor	2.79-16-16	51.77	28.33	37.47	70.70	63.71	54.06	40.33	28.20	42.72	55.08	36.85	51.39	26.69	31.10	44.17
EdgeRazor	1.88-16-16	51.22	27.73	34.21	66.91	63.66	53.35	38.43	27.60	43.80	55.92	28.78	42.51	25.09	23.17	41.60
EdgeRazor	1.58-16-16	45.75	25.77	33.89	66.64	60.72	52.33	38.23	29.80	44.40	51.70	32.85	37.34	14.25	23.17	39.77
EdgeRazor	4-8-8	57.79	33.70	45.00	67.49	67.85	55.88	40.17	33.80	43.53	54.09	39.73	53.42	42.00	34.76	47.80
EdgeRazor	2.79-8-8	52.10	28.50	37.36	70.58	63.92	53.12	40.12	28.60	42.82	54.97	36.44	49.54	26.99	32.32	44.10
EdgeRazor	1.88-8-8	51.47	27.99	34.22	66.85	63.49	53.04	38.02	27.40	43.88	55.92	29.56	44.55	25.09	23.17	41.76
EdgeRazor	1.58-8-8	44.87	26.11	33.88	66.73	60.55	51.30	38.28	31.00	44.72	50.76	33.09	38.45	15.01	22.56	39.81

Quickstart

It is recommended to ensure that EdgeRazor is installed in advance for weight-activation quantization. The provided weights are already quantized (quantized_weights*scaling_bf16); to enable activation and KV cache quantization, set trust_remote_code=True in the model configuration.

python

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "zhangsq-nju/Qwen3-0.6B-EdgeRazor-1.58bit"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False # For EdgeRazor-nbit, we only train the instruct mode.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 ()
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

Citation

If you find our project useful in your research, please consider kindly citing our papers ✏️:

text

@article{zhangsh-edgerazor,
  title={{EdgeRazor}: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation},
  author={Shu-Hao Zhang and Le-Tong Huang and Xiang-Sheng Deng and Xin-Yi Zou and Chen Wu and Nan Li and Shao-Qun Zhang},
  year={2026},
}

⚠️ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

• Benchmark scores may vary based on evaluation methodology and hardware configuration.
• VRAM requirements are estimates; actual usage depends on quantization and batch size.
• FNI scores are relative rankings and may change as new models are added.
⚠ License Unknown: Verify licensing terms before commercial use.

Social Proof

HuggingFace Hub

417Downloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-model--zhangsq-nju--qwen3-0.6b-edgerazor-1.58bit
slug: zhangsq-nju--qwen3-0.6b-edgerazor-1.58bit
source: huggingface
author: Zhangsq Nju
license: Apache-2.0
tags: transformers, safetensors, qwen3, text-generation, edgerazor, quantization, conversational, custom_code, base_model:qwen/qwen3-0.6b, base_model:finetune:qwen/qwen3-0.6b, license:apache-2.0, text-generation-inference, endpoints_compatible, region:us, arxiv:2605.04062

⚙️ Technical Specs

architecture: Qwen3ForCausalLM
params billions: 0.6
context length: 32,768
pipeline tag: text-generation
vram gb: 3
vram is estimated: true
vram formula: VRAM ≈ (params * 0.75) + 2GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

downloads: 417
stars: 0
forks: 0

Data indexed from public sources. Updated daily.