⚠️

This is a Dataset, not a Model

The following metrics do not apply: FNI Score, Deployment Options, Model Architecture

πŸ“Š

mbpp

FNI 20.5
by google-research-datasets Dataset

"--- annotations_creators: - crowdsourced - expert-generated language_creators: - crowdsourced - expert-generated language: - en license: - cc-by-4.0 multilinguality: - monolingual size_categories: - n"

Best Scenarios

✨ Data Science

Technical Constraints

Generic Use
- Size
- Rows
Parquet Format
204 Likes

Capabilities

  • βœ… Data Science

πŸ”¬Deep Dive

Expand Details [+]

πŸ› οΈ Technical Profile

⚑ Hardware & Scale

Size
-
Total Rows
-
Files
10

🧠 Training & Env

Format
Parquet
Cleaning
Raw

🌐 Cloud & Rights

Source
huggingface
License
["cc-by-4.0"]

πŸ‘οΈ Data Preview

feature label split
example_text_1 0 train
example_text_2 1 train
example_text_3 0 test
example_text_4 1 validation
example_text_5 0 train
Showing 5 sample rows. Real-time preview requires login.

🧬 Schema & Configs

Fields

feature: string
label: int64
split: string

Dataset Card

Dataset Card for Mostly Basic Python Problems (mbpp)

Table of Contents

- Table of Contents - Dataset Description - Dataset Summary - Supported Tasks and Leaderboards - Languages - Dataset Structure - Data Instances - Data Fields - Data Splits - Dataset Creation - Curation Rationale - Source Data - Initial Data Collection and Normalization - Who are the source language producers? - Annotations - Annotation process - Who are the annotators? - Personal and Sensitive Information - Considerations for Using the Data - Social Impact of Dataset - Discussion of Biases - Other Known Limitations - Additional Information - Dataset Curators - Licensing Information - Citation Information - Contributions

Dataset Description

Dataset Summary

The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on. Each problem consists of a task description, code solution and 3 automated test cases. As described in the paper, a subset of the data has been hand-verified by us.

Released here as part of Program Synthesis with Large Language Models, Austin et. al., 2021.

Supported Tasks and Leaderboards

This dataset is used to evaluate code generations.

Languages

English - Python code

Dataset Structure

python
dataset_full = load_dataset("mbpp")
DatasetDict({
    test: Dataset({
        features: ['task_id', 'text', 'code', 'test_list', 'test_setup_code', 'challenge_test_list'],
        num_rows: 974
    })
})

dataset_sanitized = load_dataset("mbpp", "sanitized") DatasetDict({ test: Dataset({ features: ['source_file', 'task_id', 'prompt', 'code', 'test_imports', 'test_list'], num_rows: 427 }) })

Data Inst

7,104 characters total