🧠

Model

Huihui Gemma 4 26b A4b It Abliterated Nvfp4

Name: Huihui Gemma 4 26b A4b It Abliterated Nvfp4
Author: sakamakismile

by sakamakismile hf-model--sakamakismile--huihui-gemma-4-26b-a4b-it-abliterated-nvfp4

Nexus Index

38.5 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 5

R: Recency 98

Q: Quality 65

Tech Context

26 Params

4.096K Ctx

Vital Performance

48 DL / 30D

0.0%

Source →

Audited 38.5 FNI Score

26B Params

4k Context

48 Downloads

24G GPU ~21GB Est. VRAM

Commercial APACHE License

Model Information Summary
Entity Passport
Registry ID	hf-model--sakamakismile--huihui-gemma-4-26b-a4b-it-abliterated-nvfp4
License	Apache-2.0
Provider	huggingface

💾

Compute Threshold

~20.8GB VRAM

Interactive

Analyze Hardware

Hardware Compatibility Test

▼

* Static estimation for 4-Bit Quantization.

📜

Cite this model

Academic & Research Attribution

BibTeX

@misc{hf_model__sakamakismile__huihui_gemma_4_26b_a4b_it_abliterated_nvfp4,
  author = {sakamakismile},
  title = {Huihui Gemma 4 26b A4b It Abliterated Nvfp4 Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/sakamakismile/huihui-gemma-4-26b-a4b-it-abliterated-nvfp4}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

sakamakismile. (2026). Huihui Gemma 4 26b A4b It Abliterated Nvfp4 [Model]. Free2AITools. https://huggingface.co/sakamakismile/huihui-gemma-4-26b-a4b-it-abliterated-nvfp4

🔬Technical Deep Dive

Full Specifications [+]

Quick Commands

🦙 Ollama Run

ollama run huihui-gemma-4-26b-a4b-it-abliterated-nvfp4

🤗 HF Download

huggingface-cli download sakamakismile/huihui-gemma-4-26b-a4b-it-abliterated-nvfp4

📦 Install Lib

pip install -U transformers

⚖️ Nexus Index V2.0

Methodology Index Protocol

38.5

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 5

Recency (R) 98

Quality (Q) 65

💬 Index Insight

FNI V2.0 for Huihui Gemma 4 26b A4b It Abliterated Nvfp4: Semantic (S:50), Authority (A:0), Popularity (P:5), Recency (R:98), Quality (Q:65).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

---

🚀 What's Next?

📊

Find Training Datasets

Discover datasets compatible with this model

📈

Compare Benchmarks

See how this model ranks on standard tests

⚡

Deployment Guide

Understand deployment options

Technical Deep Dive

Huihui-gemma-4-26B-A4B-it-abliterated-NVFP4

NVFP4 quantized version of huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated — an abliterated (uncensored) variant of Google's Gemma 4 26B-A4B Mixture-of-Experts model.

49 GB → 16.5 GB with minimal quality loss. Runs on a single NVIDIA Blackwell GPU.

Key Specs


Base model	huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated
Architecture	Gemma 4 MoE — 25.2B total, 3.8B active per token
Quantization	NVFP4 (W4A4 — weights FP4, activations FP4, scales FP8)
Format	`compressed-tensors` (native vLLM support)
Tool	vllm-project/llm-compressor (main)
Size	16.5 GB (single safetensors shard)
Requires	NVIDIA Blackwell GPU (SM 120), vLLM nightly (cu130)

Quickstart

vLLM (recommended)

bash

vllm serve Lna-Lab/Huihui-gemma-4-26B-A4B-it-abliterated-NVFP4 \
    --max-model-len 8192

No --quantization flag needed — vLLM auto-detects compressed-tensors format.

Docker

bash

docker run --gpus '"device=0"' -p 8016:8016 \
    -v /path/to/model:/models/current:ro \
    --shm-size 16gb \
    vllm/vllm-openai:cu130-nightly \
    vllm serve /models/current --port 8016 --max-model-len 8192

Python (vLLM)

python

from vllm import LLM, SamplingParams

llm = LLM(
    model="Lna-Lab/Huihui-gemma-4-26B-A4B-it-abliterated-NVFP4",
    max_model_len=8192,
    gpu_memory_utilization=0.85,
)

output = llm.generate(
    ["Explain quantum entanglement in simple terms."],
    SamplingParams(max_tokens=256, temperature=0.7),
)
print(output[0].outputs[0].text)

Benchmark

Tested on a single NVIDIA RTX PRO 6000 Blackwell (96 GB VRAM), CUDA Graph PIECEWISE mode.

Test	Tokens	Speed	Result
English reasoning	46	137 tok/s	PASS
Code generation	64	125 tok/s	PASS
Long-form output	354	160 tok/s	PASS

Sustained throughput: ~160 tok/s (post-warmup, single GPU).

Quantization Details

Recipe

yaml

default_stage:
  default_modifiers:
    QuantizationModifier:
      targets: [Linear]
      ignore: [lm_head, 're:.*embed.*', 're:.*router', 're:.*vision_tower.*']
      scheme: NVFP4

What's quantized, what's not

Quantized (NVFP4): All Linear layers in the language model, including MoE expert layers
Kept in BF16: lm_head, all embedding layers, MoE routers, entire vision tower

Calibration

Dataset: neuralmagic/calibration (LLM split)
Samples: 20
Max sequence length: 8192
MoE expert calibration handled automatically by llm-compressor's SequentialGemma4TextExperts

Reproduction

python

from datasets import load_dataset
from transformers import AutoProcessor, Gemma4ForConditionalGeneration
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

model = Gemma4ForConditionalGeneration.from_pretrained(
    "huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated", dtype="auto"
)
processor = AutoProcessor.from_pretrained(
    "huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated"
)

recipe = QuantizationModifier(
    targets="Linear",
    scheme="NVFP4",
    ignore=["lm_head", "re:.*embed.*", "re:.*router", "re:.*vision_tower.*"],
)

ds = load_dataset("neuralmagic/calibration", name="LLM", split="train[:20]")

def preprocess_function(example):
    messages = [
        {"role": m["role"], "content": [{"type": "text", "text": m["content"]}]}
        for m in example["messages"]
    ]
    return processor.apply_chat_template(
        messages, return_tensors="pt", padding=False, truncation=True,
        max_length=8192, tokenize=True, add_special_tokens=False,
        return_dict=True, add_generation_prompt=False,
    )

ds = ds.map(preprocess_function, batched=False, remove_columns=ds.column_names)

import torch
def data_collator(batch):
    assert len(batch) == 1
    return {
        key: (torch.tensor(value) if key != "pixel_values"
              else torch.tensor(value, dtype=torch.bfloat16).squeeze(0))
        for key, value in batch[0].items()
    }

oneshot(
    model=model, recipe=recipe, dataset=ds,
    max_seq_length=8192, num_calibration_samples=20,
    data_collator=data_collator,
)

model.save_pretrained("output-NVFP4", save_compressed=True)
processor.save_pretrained("output-NVFP4")

Environment

Package	Version
torch	2.11.0+cu130
transformers	5.5.4
llmcompressor	0.1.dev (main @ 3084520)
compressed-tensors	0.15.1a20260414
safetensors	0.7.0
CUDA	13.0

Requirements

GPU: NVIDIA Blackwell (RTX 5090, RTX PRO 6000, B200, etc.) — NVFP4 requires SM 120
VRAM: ~16 GB minimum
Software: vLLM nightly (cu130 build), or any framework supporting compressed-tensors NVFP4

Notes

This is an abliterated (uncensored) model. The base model has had safety training removed. Use responsibly.
Vision tower is kept in BF16 — multimodal capabilities are preserved at full precision.
NVFP4 is a Blackwell-specific format. This checkpoint will not work on Ampere/Hopper GPUs.

Credits

Base model: huihui-ai (abliteration)
Original model: Google DeepMind (Gemma 4)
Quantization tool: vllm-project/llm-compressor
Quantization method follows RedHatAI/gemma-4-26B-A4B-it-NVFP4 (proven path)

Support the Base Model Author

If you find this model useful, please consider supporting huihui-ai — the creator of the abliterated base model:

Ko-fi: https://ko-fi.com/huihuiai
Bitcoin: bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge

⚠️ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

• Benchmark scores may vary based on evaluation methodology and hardware configuration.
• VRAM requirements are estimates; actual usage depends on quantization and batch size.
• FNI scores are relative rankings and may change as new models are added.
⚠ License Unknown: Verify licensing terms before commercial use.

Social Proof

HuggingFace Hub

48Downloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-model--sakamakismile--huihui-gemma-4-26b-a4b-it-abliterated-nvfp4
slug: sakamakismile--huihui-gemma-4-26b-a4b-it-abliterated-nvfp4
source: huggingface
author: sakamakismile
license: Apache-2.0
tags: transformers, safetensors, gemma4, image-text-to-text, nvfp4, quantized, abliterated, vllm, compressed-tensors, blackwell, moe, text-generation, conversational, license:apache-2.0, endpoints_compatible, 8-bit, region:us

⚙️ Technical Specs

architecture: null
params billions: 26
context length: 4,096
pipeline tag: text-generation
vram gb: 20.8
vram is estimated: true
vram formula: VRAM ≈ (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

downloads: 48
stars: 0
forks: null

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!