🧠

Model

Sm Selective Vit Small 224

Name: Sm Selective Vit Small 224
Author: XAFT

by XAFT hf-model--xaft--sm-selective-vit-small-224

Nexus Index

38.4 Top 10%

S: Semantic 50

A: Authority 0

P: Popularity 0

R: Recency 95

Q: Quality 65

Tech Context

0.02B Params

4.096K Ctx

Vital Performance

2 DL / 30D

0.0%

Source →

Audited 38.4 FNI Score

Tiny 0.02B Params

4k Context

2 Downloads

8G GPU ~2GB Est. VRAM

Dense SMSELECTIVEVITMODELFORCLASSIFICATION Architecture

Commercial MIT License

Model Information Summary
Entity Passport
Registry ID	hf-model--xaft--sm-selective-vit-small-224
License	MIT
Provider	huggingface

💾

Compute Threshold

~1.3GB VRAM

Interactive

Analyze Hardware

Hardware Compatibility Test

▼

* Static estimation for 4-Bit Quantization.

📜

Cite this model

Academic & Research Attribution

BibTeX

@misc{hf_model__xaft__sm_selective_vit_small_224,
  author = {XAFT},
  title = {Sm Selective Vit Small 224 Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/XAFT/SM-Selective-ViT-Small-224}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

XAFT. (2026). Sm Selective Vit Small 224 [Model]. Free2AITools. https://huggingface.co/XAFT/SM-Selective-ViT-Small-224

🔬Technical Deep Dive

Full Specifications [+]

Quick Commands

🦙 Ollama Run

ollama run sm-selective-vit-small-224

🤗 HF Download

huggingface-cli download xaft/sm-selective-vit-small-224

📦 Install Lib

pip install -U transformers

⚖️ Nexus Index V2.0

Methodology Index Protocol

38.4

TOP 10% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 0

Recency (R) 95

Quality (Q) 65

💬 Index Insight

FNI V2.0 for Sm Selective Vit Small 224: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:95), Quality (Q:65).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

---

🚀 What's Next?

📊

Find Training Datasets

Discover datasets compatible with this model

📈

Compare Benchmarks

See how this model ranks on standard tests

⚡

Technical Deep Dive

Soft-Masked Selective Vision Transformer

Model Description

Soft-Masked Selective Vision Transformer is an efficient Vision Transformer (ViT) model designed to reduce the computational overhead of self-attention while maintaining competitive accuracy.
The model introduces a patch-selective attention mechanism that enables the transformer to focus on the most salient image regions and dynamically disregard less informative patches. This selective strategy significantly reduces the quadratic complexity typically associated with full self-attention, making the model particularly suitable for high-resolution vision tasks and resource-constrained environments.

To further improve performance, the model leverages knowledge distillation, transferring representational knowledge from a stronger teacher network to enhance the accuracy of lightweight transformer variants.

Intended Use

This model is intended for:

Image classification tasks
Deployment in compute- or memory-constrained environments
High-resolution image processing where standard ViTs are prohibitively expensive
Research on efficient attention mechanisms and transformer compression

Example Use Cases

Edge or embedded vision systems
Large-scale image analysis with reduced inference cost
Efficient backbones for downstream vision tasks

Training Details

Training Objective: Cross-entropy loss with optional distillation loss
Distillation: Teacher–student framework
Optimization: AdamW
Training Dataset: ILSVRC 2012
Evaluation Metrics: Top-1 accuracy, FLOPs, parameter count

Usage

Image Classification Example

python

import torch
from transformers import AutoModelForImageClassification, AutoImageProcessor
from PIL import Image
import requests

# Load image
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

# Load processor and model
processor = AutoImageProcessor.from_pretrained(
    "XAFT/SM-Selective-ViT-Small-224",
    trust_remote_code=True,
)

model = AutoModelForImageClassification.from_pretrained(
    "XAFT/SM-Selective-ViT-Small-224",
    trust_remote_code=True,
)
model = model.half() # Cast to FP16 to enable FlashAttention

# Preprocess
inputs = processor(
    images=image,
    return_tensors="pt",
)
inputs = inputs.to(torch.half) # Cast to FP16

# Forward pass
outputs = model(**inputs)
logits = outputs.logits
predicted_class = logits.argmax(-1).item()

print("Predicted class index:", predicted_class)

Evaluation Results

Model	Top-1 Acc.	Top-5 Acc.	# Params	Avg. GFLOPs
Base	80.350%	94.980%	86.60M	9.61
Base (distilled)	80.990%	95.386%	87.37M	9.21
Small	78.662%	94.454%	22.06M	3.12
Small (distilled)	79.000%	94.494%	22.45M	3.05
Tiny tall	74.802%	92.794%	11.07M	1.64
Tiny tall (distilled)	75.676%	92.988%	11.26M	1.64
Tiny	71.056%	90.192%	5.72M	0.95
Tiny (distilled)	72.618%	91.338%	5.92M	0.93

Acknowledgments

We thank the TPU Research Cloud program for providing cloud TPUs that were used to build and train the models for our extensive experiments.

Citation

If you find our work helpful, feel free to give us a cite.

bibtex

@article{TOULAOUI2026115151,
    title = {Efficient vision transformers via patch selective soft-masked attention and knowledge distillation},
    journal = {Applied Soft Computing},
    pages = {115151},
    year = {2026},
    issn = {1568-4946},
    doi = {https://doi.org/10.1016/j.asoc.2026.115151},
    url = {https://www.sciencedirect.com/science/article/pii/S1568494626005995},
    author = {Abdelfattah Toulaoui and Hamza Khalfi and Imad Hafidi},
    keywords = {Vision transformer, Patch selection, Soft masking, Efficient inference}
}

⚠️ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

• Benchmark scores may vary based on evaluation methodology and hardware configuration.
• VRAM requirements are estimates; actual usage depends on quantization and batch size.
• FNI scores are relative rankings and may change as new models are added.
⚠ License Unknown: Verify licensing terms before commercial use.

Top Tier

Social Proof

HuggingFace Hub

2Downloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-model--xaft--sm-selective-vit-small-224
slug: xaft--sm-selective-vit-small-224
source: huggingface
author: XAFT
license: MIT
tags: transformers, safetensors, softmasked_selective_vit, image-classification, vision-transformer, efficient-transformer, selective-attention, knowledge-distillation, computer-vision, custom_code, license:mit, region:us

⚙️ Technical Specs

architecture: SMSelectiveViTModelForClassification
params billions: 0.02
context length: 4,096
pipeline tag: image-classification
vram gb: 1.3
vram is estimated: true
vram formula: VRAM ≈ (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

downloads: 2
stars: 0
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!