📊

Dataset

Multipl E

Name: Multipl E
Creator: nuprl
License: ["mit"]

by nuprl hf-dataset--nuprl--multipl-e

Nexus Index

32.7 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 57

R: Recency 24

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 32.7 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--nuprl--multipl-e
License	["mit"]
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__nuprl__multipl_e,
  author = {nuprl},
  title = {Multipl E Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/nuprl/multipl-e}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

nuprl. (2026). Multipl E [Dataset]. Free2AITools. https://huggingface.co/datasets/nuprl/multipl-e

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

32.7

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 57

Recency (R) 24

Quality (Q) 30

💬 Index Insight

FNI V2.0 for Multipl E: Semantic (S:50), Authority (A:0), Popularity (P:57), Recency (R:24), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

73,765

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

Dataset Card for MultiPL-E

Dataset Description

Repository: https://github.com/nuprl/MultiPL-E
Paper: https://ieeexplore.ieee.org/abstract/document/10103177
Point of Contact: [email protected], [email protected], [email protected]

Dataset Summary

MultiPL-E is a dataset for evaluating large language models for code generation that supports 22 programming languages. It takes the OpenAI HumanEval and the Mostly Basic Python Programs (MBPP) benchmarks and uses little compilers to translate them to other languages. It is easy to add support for new languages and benchmarks.

The dataset is divided into several configurations named SRCDATA-LANG, where SRCDATA is either "humaneval" or "mbpp" and LANG is one of the supported languages. We use the canonical file extension for each language to identify the language, e.g., "cpp" for C++, "lua" for Lua, "clj" for Clojure, and so on.

Using MultiPL-E

MultiPL-E is part of the BigCode Code Generation LM Harness. This is the easiest way to use MultiPL-E.
MultiPL-E has its own evaluation framework that supports proprietary models, the prompt ablations, more source benchmarks, and more recently added programming languages. See the MultiPL-E tutorial on how to use this framework directly.

The MultiPL-E Ablations

The MultiPL-E paper presented several ablations of the prompt for the original set of programming languages. We do not include them in the current version of MultiPL-E, but they are still available in this repository from revision d23b094 or earlier. (You can optionally pass the revision to datasets.load_dataset.)

These are the prompt variations:

SRCDATA-LANG-keep is the same as SRCDATA-LANG, but the text of the prompt is totally unchanged. If the original prompt had Python doctests, they remain as Python instead of being translated to LANG. If the original prompt had Python-specific terminology, e.g., "list", it remains "list", instead of being translated, e.g., to "vector" for C++.
SRCDATA-LANG-transform transforms the doctests to LANG but leaves the natural language text of the prompt unchanged.
SRCDATA-LANG-removed removes the doctests from the prompt.

Note that MBPP does not have any doctests, so the "removed" and "transform" variations are not available for MBPP.

Changelog

Version 3.3

This update fixes a Lua bug. We had a spurious stop token that would have negatively impacts all Lua results. Re-evaluting models on Lua with this fix should produce a result that is identical or slightly higher. See Issue 165 for more information.

Version 3.2

MultiPL-E now supports Ada, thanks to Rowan Walshe. Rowan identified some issues that likely have a small negative impact on the benchmark scores for existing languages. We have not updated the prompts for those languages at this time. See the discussions PR 162 and PR 163.

Version 3.1.1

This version fixes a bug that affected some TypeScript problems, thanks to Niels Mündler . The issue impacts MBPP-based problems. The fix changes whitespace in a few HumanEval-based problems that should be insignificant. These are the relevant changes:

diff

=== mbpp-ts_prompt_mbpp_253_count_integer.diff ===
- function count_integer(list1: number| string| number[]): number {
+ function count_integer(list1: (number | string | number)[]): number {
=== mbpp-ts_prompt_mbpp_278_count_first_elements.diff ===
- function count_first_elements(test_tup: number| [number, number][]): number {
+ function count_first_elements(test_tup: (number | [number, number])[]): number {
=== mbpp-ts_prompt_mbpp_294_max_val.diff ===
- function max_val(listval: string| number[]): number {
+ function max_val(listval: (string | number)[]): number {
=== mbpp-ts_prompt_mbpp_297_flatten_list.diff ===
- function flatten_list(list1: number| number[][]): number[] {
+ function flatten_list(list1: (number | number[])[]): number[] {
=== mbpp-ts_prompt_mbpp_405_check_tuplex.diff ===
- function check_tuplex(tuplex: string| number[], tuple1: any): boolean {
+ function check_tuplex(tuplex: (string | number)[], tuple1: any): boolean {
=== mbpp-ts_prompt_mbpp_410_min_val.diff ===
- function min_val(listval: string| number[]): number {
+ function min_val(listval: (string | number)[]): number {
=== mbpp-ts_prompt_mbpp_419_round_and_sum.diff ===
- function round_and_sum(list1: number| number[]): number {
+ function round_and_sum(list1: (number | number)[]): number {
=== mbpp-ts_prompt_mbpp_65_recursive_list_sum.diff ===
- function recursive_list_sum(data_list: number| number[][]): number {
+ function recursive_list_sum(data_list: (number | number[])[]): number {
=== mbpp-ts_prompt_mbpp_755_second_smallest.diff ===
- function second_smallest(numbers: number| number[]): number | undefined {
+ function second_smallest(numbers: (number | number)[]): number | undefined {

See Github Issue 160 for more information.

Version 3.1

MultiPL-E now supports Dart, thanks to Devon Carew.

Version 3.0

This is the first significant update since MultiPL-E was used in StarCoder 1.

The dataset was versioned at 3.0, and we are bumping the software version to stay in sync.
We no longer publish the MultiPL-E ablations, but they are available in revision d23b094 and earlier.
New programming languages supported:
- Clojure, thanks to Alex Miller
- Elixir, thanks to Marko Vukovic
- Haskell, thanks to Thomas Dwyer
- OCaml, thanks to John Gouwar
Changes to existing HumanEval-based problems:
- Four Scala problems have fixed prompts/tests (12, 90, 128, 162).
- Some whitespace-only changes to problems for Racket (18 problems), R (36 problems), Julia (159 problems), and D (156 problems). We will try to avoid these kinds of changes in the future.
The MBPP-based problems have changes analogous to the HumanEval-based problems.

See the directory diffs_v3.0 in the dataset repository for the diffs to each prompt.

Version 0.5.0

Instruction-following support and new languages

New languages: Luau, Elixir, Lean, Coq, Dafny
Support for instruction-following prompts
vLLM support for faster evaluation

Version 0.4.0

QoL improvements and new languages

New languages: OCaml, MATLAB
Using .jsonl instead of .json for prompts
Several bugfixes to prompts

Version 0.3.0

This version was used to evaluate StarCoder
This version corrects several bugs in prompts and test cases that resulted in lower pass@k rates for some of the statically typed languages. The most significant difference is that the pass@k for Java increases by about 2% on HumanEval.

Version 0.2.0

This version was used to evaluate SantaCoder

📊 Structured Schema (Zero-Fabrication)

Feature Key	Data Type
`name`	`string`
`language`	`string`
`prompt`	`string`
`doctests`	`string`
`original`	`string`
`prompt_terminology`	`string`
`tests`	`string`
`stop_tokens`	`unknown`

Estimated Rows: 157

Social Proof

HuggingFace Hub

73.8KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--nuprl--multipl-e
slug: nuprl--multipl-e
source: huggingface
author: nuprl
license: ["mit"]
tags: annotations_creators:machine-generated, language_creators:machine-generated, language_creators:expert-generated, multilinguality:monolingual, source_datasets:original, source_datasets:extended|openai_humaneval, source_datasets:extended|mbpp, language:en, license:mit, size_categories:10k<n<100k, format:parquet, modality:text, library:datasets, library:pandas, library:mlcroissant, library:polars, arxiv:2301.03988, arxiv:2305.06161, doi:10.57967/hf/4446, region:us

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 73,765
stars: 65
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!