📊
Dataset

results

by Community hf-dataset--open-llm-leaderboard-old--results
Nexus Index
21.0 Top 3%
S / A / P / R / Q Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context
Vital Performance
0 DL / 30D
0.0%

!HuggingFace LeaderBoard This repository contains the outcomes of your submitted models that have been evaluated through the Open LLM Leaderboard. Our goal is to shed light on the cutting-edge Large Language Models (LLMs) and chatbots, enabling you to make well-informed decisions regarding your chosen application. The evaluation process involves running your models against several benchmarks from the Eleuther AI Harness, a unified framework for measuring the effectivene...

Data Integrity 21 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--open-llm-leaderboard-old--results
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__open_llm_leaderboard_old__results,
  author = {Community},
  title = {results Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/open-llm-leaderboard-old/results}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
Community. (2026). results [Dataset]. Free2AITools. https://huggingface.co/datasets/open-llm-leaderboard-old/results

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

âš–ī¸ Nexus Index V2.0

21.0
ESTIMATED IMPACT TIER
Semantic (S) 50
Authority (A) 0
Popularity (P) 0
Recency (R) 0
Quality (Q) 0

đŸ’Ŧ Index Insight

FNI V2.0 for results: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
âŦ‡ī¸
Downloads
13,266
â¤ī¸
Likes
50

đŸ‘ī¸ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

đŸ§Ŧ Field Logic

đŸ§Ŧ

Schema not yet indexed for this dataset.

Dataset Specification


language:

  • en

HuggingFace LeaderBoard

Open LLM Leaderboard Results

This repository contains the outcomes of your submitted models that have been evaluated through the Open LLM Leaderboard. Our goal is to shed light on the cutting-edge Large Language Models (LLMs) and chatbots, enabling you to make well-informed decisions regarding your chosen application.

Evaluation Methodology

The evaluation process involves running your models against several benchmarks from the Eleuther AI Harness, a unified framework for measuring the effectiveness of generative language models. Below is a brief overview of each benchmark:

  1. AI2 Reasoning Challenge (ARC) - Grade-School Science Questions (25-shot)
  2. HellaSwag - Commonsense Inference (10-shot)
  3. MMLU - Massive Multi-Task Language Understanding, knowledge on 57 domains (5-shot)
  4. TruthfulQA - Propensity to Produce Falsehoods (0-shot)
  5. Winogrande - Adversarial Winograd Schema Challenge (5-shot)
  6. GSM8k - Grade School Math Word Problems Solving Complex Mathematical Reasoning (5-shot)

Together, these benchmarks provide an assessment of a model's capabilities in terms of knowledge, reasoning, and some math, in various scenarios.

Exploring Model Details

For further insights into the inputs and outputs of specific models, locate the "📄" emoji associated with the desired model in the leaderboard. Clicking on this icon will direct you to the respective GitHub page containing detailed information about the model's behavior during the evaluation process.

Top Tier

Social Proof

HuggingFace Hub
50Likes
13.3KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id
hf-dataset--open-llm-leaderboard-old--results
source
huggingface
author
Community
tags
language:enregion:us

âš™ī¸ Technical Specs

architecture
null
params billions
null
context length
null

📊 Engagement & Metrics

likes
50
downloads
13,266

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)