🧠
Model

Hallucination Leaderboard

by vectara vectara/hallucination-leaderboard
Free2AITools Nexus Index
48.0
S: Semantic 50

Query-time baseline · scored live at search

A: Authority 59
P: Popularity 70
R: Recency 90
Q: Quality 70
Tech Context
Vital Performance

Task categories from upstream metadata

πŸ’¬Chat & Dialogue

Technical Constraints

Experimental / High Latency
Low FNI signal 48 FNI Score
Tiny - Params
- Context
0 Downloads
Commercial APACHE License
Model Information Summary
Entity Passport
Registry ID vectara/hallucination-leaderboard
License Apache-2.0
Provider github
πŸ“œ

Cite this model

Academic & Research Attribution

BibTeX
@misc{vectara_hallucination_leaderboard,
  author = {vectara},
  title = {Hallucination Leaderboard Model},
  year = {2026},
  howpublished = {\url{https://github.com/vectara/hallucination-leaderboard}},
  note = {Accessed via Free2AITools.}
}
APA Style
vectara. (2026). Hallucination Leaderboard [Model]. Free2AITools. https://github.com/vectara/hallucination-leaderboard

πŸ”¬Technical Deep Dive

Full Specifications [+]

Quick Commands

πŸ™ Git Clone
git clone https://github.com/vectara/hallucination-leaderboard

βš–οΈ Free2AITools Nexus Index V2.0

Semantic (S) 50

Query-time baseline · scored live at search

Authority (A) 59
Popularity (P) 70
Recency (R) 90
Quality (Q) 70

πŸ’¬ Index Insight

FNI V2.0 for Hallucination Leaderboard: Authority (A:59), Popularity (P:70), Recency (R:90), Quality (Q:70). Semantic (S) is a query-time baseline scored live at search.

Free2AITools Nexus Index

Data Sources / Provenance

Open data Updated: Live data
---

πŸš€ What's Next?

Technical Deep Dive

Hallucination Leaderboard

Public LLM leaderboard computed using Vectara's Hallucination Evaluation Model, also known as HHEM. This evaluates how often an LLM introduces hallucinations when summarizing a document. We plan to update this regularly as our model and the LLMs get updated over time.

Feel free to check out the interactive hallucination leaderboard on Hugging Face.

If you are interested in previous versions os this leaderboard:

  1. First version based on HHEM-1.0, it is available here
  2. Most recent version, based on the previous dataset is available here
In loving memory of Simon Mark Hughes...

Last updated on April 28, 2026

Plot: hallucination rates of various LLMs

Model Hallucination Rate Factual Consistency Rate Answer Rate Average Summary Length (Words)
antgroup/finix_s1_32b 1.8 % 98.2 % 99.5 % 172.4
openai/gpt-5.4-nano-2026-03-17 3.1 % 96.9 % 100.0 % 144.4
google/gemini-2.5-flash-lite 3.3 % 96.7 % 99.5 % 95.7
microsoft/Phi-4 3.7 % 96.3 % 80.7 % 120.9
meta-llama/Llama-3.3-70B-Instruct-Turbo 4.1 % 95.9 % 99.5 % 64.6
snowflake/snowflake-arctic-instruct 4.3 % 95.7 % 62.7 % 81.4
google/gemma-3-12b-it 4.4 % 95.6 % 97.4 % 89.7
mistralai/mistral-large-2411 4.5 % 95.5 % 99.9 % 85.0
qwen/qwen3-8b 4.8 % 95.2 % 99.9 % 83.6
amazon/nova-pro-v1:0 5.1 % 94.9 % 99.3 % 66.2
amazon/nova-2-lit

⚠️ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source β†’

πŸ“ Limitations & Considerations

  • β€’ Benchmark scores may vary based on evaluation methodology and hardware configuration.
  • β€’ VRAM requirements are estimates; actual usage depends on quantization and batch size.
  • β€’ FNI scores are relative rankings and may change as new models are added.
  • ⚠ License Unknown: Verify licensing terms before commercial use.

Social Proof

GitHub Repository
3.2KStars
103Forks
πŸ”„ Updated daily

Source summary: Based on GitHub metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

πŸ†” Identity & Source

id
gh-model--vectara--hallucination-leaderboard
slug
vectara--hallucination-leaderboard
source
github
author
vectara
license
Apache-2.0
tags
generative-ai, hallucinations, llm, python

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag
text-generation

πŸ“Š Engagement & Metrics

downloads
0
stars
3,179
forks
103

Data indexed from public sources. Updated daily.