🧠

Model

Securereview 7b Mlx 4bit

Name: Securereview 7b Mlx 4bit
Author: vitorallo

by vitorallo hf-model--vitorallo--securereview-7b-mlx-4bit

Free2AITools Nexus Index

38.9 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 5

R: Recency 98

Q: Quality 65

Tech Context

1.19 Params

32.768K Ctx

Vital Performance

44 DL / 30D

0.0%

Source →

Audited 38.9 FNI Score

Tiny 1.19B Params

32k Context

44 Downloads

8G GPU ~4GB Est. VRAM

Dense QWEN2FORCAUSALLM Architecture

Commercial APACHE License

Model Information Summary
Entity Passport
Registry ID	hf-model--vitorallo--securereview-7b-mlx-4bit
License	Apache-2.0
Provider	huggingface

💾

Compute Threshold

~3.4GB VRAM

Interactive

Analyze Hardware

Hardware Compatibility Test

▼

* Static estimation for 4-Bit Quantization.

📜

Cite this model

Academic & Research Attribution

BibTeX

@misc{hf_model__vitorallo__securereview_7b_mlx_4bit,
  author = {vitorallo},
  title = {Securereview 7b Mlx 4bit Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/vitorallo/securereview-7b-mlx-4bit}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

vitorallo. (2026). Securereview 7b Mlx 4bit [Model]. Free2AITools. https://huggingface.co/vitorallo/securereview-7b-mlx-4bit

🔬Technical Deep Dive

Full Specifications [+]

Quick Commands

🦙 Ollama Run

ollama run securereview-7b-mlx-4bit

🤗 HF Download

huggingface-cli download vitorallo/securereview-7b-mlx-4bit

⚖️ Free2AITools Nexus Index V2.0

Methodology Index Protocol

Semantic (S) 50

Authority (A) 0

Popularity (P) 5

Recency (R) 98

Quality (Q) 65

💬 Index Insight

FNI V2.0 for Securereview 7b Mlx 4bit: Semantic (S:50), Authority (A:0), Popularity (P:5), Recency (R:98), Quality (Q:65).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

---

🚀 What's Next?

📊

Find Training Datasets

Discover datasets compatible with this model

📈

Compare Benchmarks

See how this model ranks on standard tests

⚡

Technical Deep Dive

securereview-7b (MLX 4-bit)

A QLoRA fine-tune of Qwen2.5-Coder-7B-Instruct for function-level security code review. The model takes a single function plus a small amount of caller/callee context and emits a structured JSON findings array. Training harness, data pipeline, and evaluation code are open source at github.com/vitorallo/securereview-7b.

In one paragraph

The model is a 16-layer LoRA adapter (rank 16, scale 2.0) fused into a 4-bit AWQ-quantised Qwen2.5-Coder-7B-Instruct base, trained for three epochs on 11,792 supervised examples. Each example is a three-message chat (system / user / assistant) where the assistant message is a valid JSON findings object. The model was trained with mask_prompt: true so the cross-entropy loss was computed only on the assistant tokens — the user prompt acts purely as conditioning context, never as an optimisation target. On 200 held-out in-distribution test examples the fine-tune reaches F1 61.1% against base 14.3% (+46.8 pp), with zero false positives across 37 clean examples and 99% JSON parse rate. Training wall time was ~8 hours on Apple Silicon, with external AI API calls to a pair of data-generation passes. Everything is Apache-2.0.

Evaluation

Evaluated against the unmodified base model (mlx-community/Qwen2.5-Coder-7B-Instruct-4bit) on a 200-example held-out slice of the in-distribution test split (200 of 1,268 records, stratified over (source, language, primary_category), zero overlap with training):

Metric	base	securereview-7b	Δ
F1	14.3%	61.1%	+46.8 pp
Precision	13.9%	61.5%	+47.6 pp
Recall	14.7%	60.7%	+46.0 pp
False positive rate	70.3%	0.0%	−70.3 pp
JSON parse rate	95.5%	99.0%	+3.5 pp

Per-CWE class:

Class	base	securereview-7b
Logic bugs (BOLA, auth, race, TOCTOU)	29.2%	58.3%
Pattern bugs (SQLi, XSS, cmd inj, …)	28.9%	71.1%

Per-language recall:

Language	base	securereview-7b
Python	20%	57%
JavaScript	17%	75%
TypeScript	0%	62%
Go	12%	59%
Java	7%	53%
Ruby	27%	73%
Rust	0%	75%
C++	11%	37%
C#	22%	78%

Out-of-distribution benchmark

On a separate 23-example out-of-distribution benchmark (hand-authored DVNA and Juice Shop vulnerabilities) the fine-tune scores F1 26.1% against base 34.0%. The training distribution is CVEFixes commit-level data plus rule-steered synthetic examples; the benchmark is small (n=23) and its code does not resemble the training distribution. Treat the in-distribution number as the meaningful signal. An expanded out-of-distribution benchmark is planned for the next release.

How it was trained

Technique

Supervised fine-tuning via LoRA, implemented with mlx-lm lora. The adapter wraps the attention and feed-forward projections of the last 16 transformer layers at rank 16 and scale 2.0 — roughly 23 million trainable parameters out of the base model's 7.6 billion (0.3%). The optimiser is AdamW at a constant learning rate of 2×10⁻⁴ with no warmup and no schedule. Effective batch size is 4 (1 × 4), context window 4,096 tokens, gradient checkpointing enabled to keep peak memory under 8 GB. Three epochs over 9,256 training records comes to 6,942 iterations.

The training objective is plain cross-entropy on the next-token prediction, but with mask_prompt: true the loss is zero everywhere inside the system and user messages. Only the assistant JSON contributes gradients. This means the model is not being trained to write the prompt format — it is being trained to complete the conversation given that format. At inference time, sending a prompt that deviates from the training shape moves the query outside the conditioning region and the model reverts to its base behaviour (see the inference contract below).

Data

The training corpus is 11,792 examples across nine languages, split 80/10/10 stratified on (source, language, primary_category):

Source	Records	Notes
`cvefixes`	7,846	Real vulnerable functions from CVEFixes v1.0.8 (Zenodo)
`synthetic`	2,723	Sonnet-generated vulnerable/fixed pairs steered by rule YAML
`synthetic_logic`	276	Sonnet-generated logic-intent examples (missing-check framing)
`claude_rules`	537	Parsed from TikiTribe/claude-secure-coding-rules
`safe_logic`	186	DeepSeek-generated clean code demonstrating affirmative patterns
`clean`	224	DeepSeek-generated clean-code negatives
Total	11,792

Languages: Python, JavaScript, TypeScript, Go, Java, Ruby, Rust, C++, C#. Every source is permissively licensed; the dataset inherits Apache-2.0.

Credits

TikiTribe/claude-secure-coding-rules — the claude_rules source (537 training records) was parsed from this community-maintained corpus of 100+ security rule sets covering 12 languages, 5 backend frameworks, OWASP Top 10 2025, NIST AI RMF, MITRE ATLAS, and Google SAIF. Every rule ships with structured Do/Don't/Why/Refs sections and CWE + OWASP tags that slot directly into our training schema with no API calls. If you build your own security-review model, this is a very good place to start.
CVEFixes v1.0.8 — extracted only 5,000+ (to not unbalance the training) real CVE commits and the before/after function pairs that underpin the bulk of the training corpus. Method and citation at arxiv.org/abs/2107.08760.
Qwen2.5-Coder-7B-Instruct — the base model. Licensed under the Qwen Research License.

Memorisation defence

All user-visible finding fields (description, recommendation, rules_text) are generated by the pipeline rather than copied from the source CVE metadata. Descriptions are built from 12 short phrasings rotated by body hash and interpolated with the vulnerable line number and code snippet. Recommendations combine a per-record prefix (function name plus line number) with a category-specific advice pool whose entries are all ≤25 characters long. The constraint is that no fixed 30-character window of assistant-message text may repeat in more than a per-field threshold of records — description × 20, recommendation × 20, code × 10. A pre-flight probe (pipeline/memorisation_probe.py) walks the final training split and refuses to launch training if any substring crosses these limits, and a hard regex list catches any CVE identifier or CVE-description prose that could otherwise slip through.

This precaution was added after an earlier iteration that stored CVEFixes descriptions verbatim taught the model to regurgitate them on arbitrary user code. See the development story below.

Development story (M1 → M3)

The current release is the third serious training run. Each earlier attempt taught something specific that shaped the data pipeline this release trained on. Recording the iterations here so the evaluation numbers land in context rather than arriving out of thin air.

M1 — assemble the dataset

The pipeline was scaffolded from scratch: a schema matching the intended output, six data generators (CVEFixes SQLite ingest, rule-steered synthetic generation, vulnerable-app extraction, clean code negatives, etc.), and a stratified 80/10/10 assembly step with body-hash dedup and leakage checks. The M1 milestone finished with 11,327 records and a clean validator pass. No training yet.

M2 — first SFT pass (withdrawn)

The first fine-tune finished cleanly: 3 epochs, val loss 1.722 → 0.239, no errors. Then the evaluation harness scored it −15 F1 points below the base model on the held-out benchmark. Inspecting the outputs showed the model emitting phrases like:

"description": "[{'lang': 'en', 'value': 'Multiple SQL injection vulnerabilities in register.php in GeniXCMS before 1.0.0 allow remote attackers to execute arbitrary SQL commands via unspecified vectors.'}]"

on a simple DVNA function. The problem was traced to the CVEFixes ingest: the v1.0.8 Zenodo dump stores cve.description as a Python repr of a list of dicts, not as plain text. The M1 pipeline stored that raw string as the training label. Because CVEFixes was 78% of the training data and every record shared the same corrupted format, the model learned the dominant label shape and replayed it. The ingest was patched to call ast.literal_eval on the column and extract the English description, and the templated "Reference fix commit from {cve_id}" recommendation was replaced with a varied pool to reduce memorisation pressure.

M2.5 — data cleanup and bundled improvements

The second iteration fixed the description corruption and, in the same run, added several improvements drawn from the M2 post-mortem:

A sink-pattern line picker (SINK_PATTERNS_BY_CWE) that points at the actual vulnerable line in the body rather than falling back to line 1 on the function signature. Dropped line == 1 records from ~78% to ~5%.
A text-match caller/callee heuristic for CVEFixes so records carry some surrounding context, not just the isolated vulnerable function.
Per-CVE rules_text synthesis through a concurrent DeepSeek V3.2 prefetch (10 worker threads against OpenRouter), bringing the rules_text coverage from 0% to 100% of CVEFixes records.
A new safe_logic_gen.py source producing ~186 clean-code affirmative-pattern examples (ownership-filtered queries, role-gated routes, race-safe balance updates) so the model learns what correct logic looks like alongside what broken logic looks like.
Loosened claude_rules_ingest.py parser recovering more logic-rule examples from the public corpus.
Type-coercion defences in the training-record builder so LLM output with the wrong primitive type (string confidence, missing severity) no longer crashes the validator.

M2.5 retrained on the new data and scored F1 18.6% → F1 22.2% on the same benchmark. A raw-output diff against the base model revealed two new problems masking the underlying progress that were fixed looking to the next iteration.

When the stop-token fix was applied (see next paragraph) retroactively at inference time, M2.5's test-split F1 jumped from 30.0% to 60.1% — proof that the adapter had learned the right thing but the decoder was throwing half the output away.

M3 — this release

The current release rebuilds the data against these lessons:

Per-record interpolated descriptions. 12 phrasings, each with ≤25 characters of fixed text between interpolation points. The vulnerable snippet, line number, and category are woven through the template so no long fixed window can repeat across the stratum.
Per-record interpolated recommendations. Five prefix templates interpolate the function name and line number; ten category-specific advice pools provide ≤25-character suffixes. The full recommendation is nearly unique per record.
Strict CVE sanitisation. Any CVE identifier in any assistant-message field is a hard-forbidden pattern. CVE prose from the source cve.description column never reaches the training labels. Code comments referencing CVE identifiers are stripped from the body before hashing so commit-message annotations in the CVEFixes corpus do not train the model to condition on CVE-shaped comments.
Dedicated logic-intent generation pass. A new --logic-only mode on the synthetic generator asks the model for examples where the vulnerability is a missing check (ownership filter, role gate, mass-assignment allowlist, session verification) rather than a bad API call. 276 new records added to the dataset.
Stop-token fix as a first-class artefact. generation_config.json and config.json in this published model both specify eos_token_id: [151643, 151645], so any MLX-LM loader reading the generation config gets the correct stop set without a runtime patch. The evaluation harness and the fused-model smoke test both include regression guards that fail loudly if the !<|im_end|> cycling pattern ever returns.
Pre-flight memorisation probe. pipeline/memorisation_probe.py walks train.jsonl before every training run and refuses to proceed if any 30-character substring in an assistant-message field repeats more than the per-field threshold.

Final val loss: 0.214 (started at 1.521). In-distribution F1 61.1%, zero false positives, 99% JSON parse rate. See the evaluation section for the full breakdown.

Inference contract

The model was trained with mask_prompt: true, so the loss was computed only on the assistant JSON. Every training example paired a specific system message with a structured user prompt. Using a shorter or differently-shaped prompt moves inference outside the conditioning region the model was trained on and recovers base-model behaviour (English prose, no JSON).

System message (exact string, 18 words)

text

You are a JSON API that performs security code review. You only output valid JSON. Never output markdown, explanations, or text outside JSON.

User prompt structure

text

Analyze this function for security vulnerabilities.

Function: 
File: ->
Role: 
Called by: 
Calls: 

Security checks:                                   # optional but recommended


Code:

```

Respond with ONLY a valid JSON object: {"findings": [{"severity": "HIGH|MEDIUM|LOW", "category": "...", "line": integer, "code": "...", "description": "...", "recommendation": "...", "confidence": 0.0-1.0, "cwe_id": "CWE-xxx"}]}

If no vulnerabilities: {"findings": []} Use standard vulnerability class names for category: SQL Injection, Command Injection, XSS, CSRF, IDOR, Open Redirect, Path Traversal, XXE, Insecure Deserialization, Broken Authentication, Broken Access Control, Security Misconfiguration, Sensitive Data Exposure, Insufficient Logging. Do NOT use rule IDs as category. Only report real, exploitable vulnerabilities with specific line numbers.

text


### Stop-token handling

Qwen2.5-Coder-Instruct uses `<|im_end|>` (token id 151645) as its
chat-template EOS. This model ships with `generation_config.json` and
`config.json` both specifying `eos_token_id: [151643, 151645]`, so
mlx-lm 0.20+ reads both values and stops cleanly. If your loader does
not read the generation config, patch the stop set at runtime:

```python
if hasattr(tokenizer, "eos_token_ids") and 151645 not in tokenizer.eos_token_ids:
    tokenizer.eos_token_ids.add(151645)

Important note:

Downloading the model

There are two common ways to fetch this model.

Direct load via `mlx-lm`

If mlx-lm is already installed, it will download the model into the HuggingFace cache (~/.cache/huggingface/hub/) on first use and load it from there on subsequent runs:

python

from mlx_lm import load
model, tokenizer = load("vitorallo/securereview-7b-mlx-4bit")

No separate download step, no extra CLI install.

Explicit pre-fetch via `uvx hf download`

For air-gapped deployments, CI caches, or container image builds, pre-fetch the model into a local directory with the hf CLI via uvx (no pip install huggingface_hub required):

bash

uvx hf download vitorallo/securereview-7b-mlx-4bit \
    --local-dir ./securereview-7b-mlx-4bit

# Then load from the local path
python -c "
from mlx_lm import load
model, tok = load('./securereview-7b-mlx-4bit')
"

uvx runs the latest hf CLI from PyPI in an isolated environment. See the HuggingFace CLI guide for the full command reference.

Hardware requirements

Apple Silicon (MLX): M1 Pro or later recommended. ~~4 GB disk for the 4-bit quantised model, ~8 GB peak RAM during inference. Throughput on a baseline M-series is ~0.14 examples/second (~~7 seconds per function at 512 max tokens); ~1–2 seconds per function on Pro/Max class hardware.
x86 / CUDA: this repository is the MLX variant. A Transformers variant may follow. In the meantime the training adapter can be paired with a different base via standard PEFT tooling; see the source repository.

Limitations

Single-function scope. The model sees one function per request. Cross-file vulnerabilities (e.g. unsafe deserialisation in module A triggered from module B) are out of scope for this release.
Context ceiling. Training used max_seq_length=4096. Functions larger than roughly 1,200 lines will be truncated; behaviour on truncated input is undefined.
Detection only. The recommendation field produces short pointers (≤25 characters of fixed advice plus per-record interpolated function name and line number), not full patches. A dedicated remediation adapter is planned.
Distribution shift on hand-authored benchmarks. As noted in the evaluation section, the fine-tune scores below base on a 23-example out-of-distribution benchmark. Do not extrapolate the in-distribution result to wildly different code distributions without re-evaluating.
Human review still required. This is a semantic-layer reviewer that complements, not replaces, static analysis tools such as Semgrep, CodeQL, or Bandit. Any finding it emits should be reviewed by a human before acting on it.

Reproducibility

Every artifact in this repository is regenerable from the code at github.com/vitorallo/securereview-7b plus the public data sources listed in the training-data section. The pipeline is deterministic given the random seed and API call cache; the training config is committed; the evaluation harness is committed; the memorisation probe is committed. See the source repository for full reproduction instructions.

Citation

text

@misc{securereview-7b-2026,
  author = {Vito Rallo},
  title  = {securereview-7b: a 7B fine-tune for structured security code review},
  year   = {2026},
  url    = {https://huggingface.co/vitorallo/securereview-7b-mlx-4bit}
}

License

Apache-2.0. The base model (Qwen2.5-Coder-7B-Instruct) is released under its own license; see the Qwen model card. No non-commercial or share-alike data was used in the training corpus.

⚠️ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

• Benchmark scores may vary based on evaluation methodology and hardware configuration.
• VRAM requirements are estimates; actual usage depends on quantization and batch size.
• FNI scores are relative rankings and may change as new models are added.
⚠ License Unknown: Verify licensing terms before commercial use.

Social Proof

HuggingFace Hub

44Downloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-model--vitorallo--securereview-7b-mlx-4bit
slug: vitorallo--securereview-7b-mlx-4bit
source: huggingface
author: vitorallo
license: Apache-2.0
tags: mlx, safetensors, qwen2, security, code-review, vulnerability-detection, qwen2.5, qlora, sast, cwe, text-generation, conversational, code, arxiv:2107.08760, license:apache-2.0, 4-bit, region:us

⚙️ Technical Specs

architecture: Qwen2ForCausalLM
params billions: 1.19
context length: 32,768
pipeline tag: text-generation
vram gb: 3.4
vram is estimated: true
vram formula: VRAM ≈ (params * 0.75) + 2GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

downloads: 44
stars: 0
forks: 0

Data indexed from public sources. Updated daily.

Cite this model

🔬Technical Deep Dive

Quick Commands

⚖️ Free2AITools Nexus Index V2.0

💬 Index Insight

Verification Authority

🚀 What's Next?

Find Training Datasets

Compare Benchmarks

Deployment Guide

Technical Deep Dive

securereview-7b (MLX 4-bit)

In one paragraph

Evaluation

Out-of-distribution benchmark

How it was trained

Technique

Data

Credits

Memorisation defence

Development story (M1 → M3)

M1 — assemble the dataset

M2 — first SFT pass (withdrawn)

M2.5 — data cleanup and bundled improvements

M3 — this release

Inference contract

System message (exact string, 18 words)

User prompt structure

Downloading the model

Direct load via `mlx-lm`

Explicit pre-fetch via `uvx hf download`

Hardware requirements

Limitations

Reproducibility

Citation

License

⚠️ Incomplete Data

📝 Limitations & Considerations

Social Proof

🛡️ Model Transparency Report

🆔 Identity & Source

⚙️ Technical Specs

📊 Engagement & Metrics