🧠

Model

Octen Embedding 0.6b Onnx

Name: Octen Embedding 0.6b Onnx
Author: cstr

by cstr hf-model--cstr--octen-embedding-0.6b-onnx

Free2AITools Nexus Index

38.0 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 10

R: Recency 92

Q: Quality 50

Tech Context

0.6B Params

32.768K Ctx

Vital Performance

131 DL / 30D

0.0%

Source →

Audited 38 FNI Score

Tiny 0.6B Params

32k Context

131 Downloads

8G GPU ~3GB Est. VRAM

Dense QWEN3MODEL Architecture

Commercial APACHE License

Model Information Summary
Entity Passport
Registry ID	hf-model--cstr--octen-embedding-0.6b-onnx
License	Apache-2.0
Provider	huggingface

💾

Compute Threshold

~3GB VRAM

Interactive

Analyze Hardware

Hardware Compatibility Test

▼

* Static estimation for 4-Bit Quantization.

📜

Cite this model

Academic & Research Attribution

BibTeX

@misc{hf_model__cstr__octen_embedding_0.6b_onnx,
  author = {cstr},
  title = {Octen Embedding 0.6b Onnx Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/cstr/Octen-Embedding-0.6B-ONNX}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

cstr. (2026). Octen Embedding 0.6b Onnx [Model]. Free2AITools. https://huggingface.co/cstr/Octen-Embedding-0.6B-ONNX

🔬Technical Deep Dive

Full Specifications [+]

Quick Commands

🦙 Ollama Run

ollama run octen-embedding-0.6b-onnx

🤗 HF Download

huggingface-cli download cstr/octen-embedding-0.6b-onnx

⚖️ Free2AITools Nexus Index V2.0

Methodology Index Protocol

Semantic (S) 50

Authority (A) 0

Popularity (P) 10

Recency (R) 92

Quality (Q) 50

💬 Index Insight

FNI V2.0 for Octen Embedding 0.6b Onnx: Semantic (S:50), Authority (A:0), Popularity (P:10), Recency (R:92), Quality (Q:50).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

---

🚀 What's Next?

📊

Find Training Datasets

Discover datasets compatible with this model

📈

Compare Benchmarks

See how this model ranks on standard tests

⚡

Technical Deep Dive

Octen-Embedding-0.6B — FP32 ONNX (dynamo export)

ONNX export of Octen/Octen-Embedding-0.6B, the Qwen3-0.6B fine-tune for semantic search and retrieval.

This is the full-precision (FP32) reference export. For production use, prefer the INT8 or INT4 variants which are 2–4× smaller with negligible quality loss.

Export method

Exported with torch.onnx.export(dynamo=True) (PyTorch 2.9, opset 20).

The dynamo exporter traces at the FX-graph / symbolic level rather than via eager execution. This means all internal tensor shapes — including the Qwen3 causal attention mask — carry symbolic batch and sequence dimensions throughout. The legacy torch.onnx.export produced a static batch=1 inside the causal-mask BitAnd node, breaking inference for batch > 1.

Dynamic batch verified: batch = 1, 2, 4, 8 all produce correct output shapes.

Model details

Property	Value
Base model	Qwen3-Embedding-0.6B (Qwen/Qwen3-Embedding-0.6B)
Fine-tune	Octen/Octen-Embedding-0.6B
Parameters	596 M
Embedding dim	1024
Max context	32 768 tokens
Inputs	`input_ids [batch, seq]`, `attention_mask [batch, seq]`
Output	`last_hidden_state [batch, seq, 1024]`
Pooling	Last-token pooling + L2 normalisation (applied by the inference runtime)
File size	~2.4 GB (`model.onnx` + `model.onnx.data`)

Inference

python

import onnxruntime as ort
import numpy as np
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.enable_padding(pad_id=0)
tokenizer.enable_truncation(max_length=512)

session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])

texts = ["semantic search example", "another sentence"]
enc = tokenizer.encode_batch(texts)
ids  = np.array([e.ids      for e in enc], dtype=np.int64)
mask = np.array([e.attention_mask for e in enc], dtype=np.int64)

lhs = session.run(None, {"input_ids": ids, "attention_mask": mask})[0]  # [batch, seq, 1024]

# Last-token pooling: take embedding at last non-padding position
seq_lens = mask.sum(axis=1) - 1
embeddings = lhs[np.arange(len(texts)), seq_lens]

# L2 normalise
norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
embeddings = embeddings / np.maximum(norms, 1e-8)
print(embeddings.shape)  # (2, 1024)

Files

File	Size	Description
`model.onnx`	~4 MB	ONNX graph (opset 20, no weights)
`model.onnx.data`	~2.38 GB	External weight data
`tokenizer.json`	11 MB	HuggingFace fast tokenizer
`config.json`	—	Model config
`export_octen_onnx_dynamo.py`	—	Reproduction script (PyTorch ≥ 2.1)

Quantized variants

Repo	Precision	Size	Notes
cstr/octen-embedding-0.6b-onnx	FP32	2.4 GB	This repo — reference
cstr/octen-embedding-0.6b-onnx-int8	INT8 per-channel	1.1 GB	Recommended for most use
cstr/octen-embedding-0.6b-onnx-int4	INT4 MatMulNBits	0.9 GB	Minimum RAM

License

Apache 2.0 — same as Octen/Octen-Embedding-0.6B and Qwen/Qwen3-Embedding-0.6B.

⚠️ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

• Benchmark scores may vary based on evaluation methodology and hardware configuration.
• VRAM requirements are estimates; actual usage depends on quantization and batch size.
• FNI scores are relative rankings and may change as new models are added.
⚠ License Unknown: Verify licensing terms before commercial use.

Social Proof

HuggingFace Hub

131Downloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-model--cstr--octen-embedding-0.6b-onnx
slug: cstr--octen-embedding-0.6b-onnx
source: huggingface
author: cstr
license: Apache-2.0
tags: onnxruntime, onnx, qwen3, embedding, text-embedding, retrieval, sentence-similarity, feature-extraction, en, de, zh, multilingual, base_model:octen/octen-embedding-0.6b, base_model:quantized:octen/octen-embedding-0.6b, license:apache-2.0, region:us

⚙️ Technical Specs

architecture: Qwen3Model
params billions: 0.6
context length: 32,768
pipeline tag: sentence-similarity
vram gb: 3
vram is estimated: true
vram formula: VRAM ≈ (params * 0.75) + 2GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

downloads: 131
stars: 0
forks: 0

Data indexed from public sources. Updated daily.