This is a Dataset, not a Model
The following metrics do not apply: FNI Score, Deployment Options, Model Architecture
mbpp
"--- annotations_creators: - crowdsourced - expert-generated language_creators: - crowdsourced - expert-generated language: - en license: - cc-by-4.0 multilinguality: - monolingual size_categories: - n"
Best Scenarios
Technical Constraints
πΈοΈ Neural Graph Explorer
v15.13π Interest Trend
* Real-time activity index across HuggingFace, GitHub and Research citations.
Capabilities
- β Data Science
Finding datasets with similar distribution...
No benchmark correlations for this dataset.
π¬Deep Dive
Expand Details [+]βΎ
π οΈ Technical Profile
β‘ Hardware & Scale
π§ Training & Env
π Cloud & Rights
ποΈ Data Preview
| feature | label | split |
|---|---|---|
| example_text_1 | 0 | train |
| example_text_2 | 1 | train |
| example_text_3 | 0 | test |
| example_text_4 | 1 | validation |
| example_text_5 | 0 | train |
𧬠Schema & Configs
Fields
Dataset Card
Dataset Card for Mostly Basic Python Problems (mbpp)
Table of Contents
- Table of Contents - Dataset Description - Dataset Summary - Supported Tasks and Leaderboards - Languages - Dataset Structure - Data Instances - Data Fields - Data Splits - Dataset Creation - Curation Rationale - Source Data - Initial Data Collection and Normalization - Who are the source language producers? - Annotations - Annotation process - Who are the annotators? - Personal and Sensitive Information - Considerations for Using the Data - Social Impact of Dataset - Discussion of Biases - Other Known Limitations - Additional Information - Dataset Curators - Licensing Information - Citation Information - ContributionsDataset Description
- Repository: https://github.com/google-research/google-research/tree/master/mbpp
- Paper: Program Synthesis with Large Language Models
Dataset Summary
The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on. Each problem consists of a task description, code solution and 3 automated test cases. As described in the paper, a subset of the data has been hand-verified by us.Released here as part of Program Synthesis with Large Language Models, Austin et. al., 2021.
Supported Tasks and Leaderboards
This dataset is used to evaluate code generations.Languages
English - Python codeDataset Structure
dataset_full = load_dataset("mbpp")
DatasetDict({
test: Dataset({
features: ['task_id', 'text', 'code', 'test_list', 'test_setup_code', 'challenge_test_list'],
num_rows: 974
})
})dataset_sanitized = load_dataset("mbpp", "sanitized")
DatasetDict({
test: Dataset({
features: ['source_file', 'task_id', 'prompt', 'code', 'test_imports', 'test_list'],
num_rows: 427
})
})
Data Inst
Dataset Card for Mostly Basic Python Problems (mbpp)
Table of Contents
- Table of Contents - Dataset Description - Dataset Summary - Supported Tasks and Leaderboards - Languages - Dataset Structure - Data Instances - Data Fields - Data Splits - Dataset Creation - Curation Rationale - Source Data - Initial Data Collection and Normalization - Who are the source language producers? - Annotations - Annotation process - Who are the annotators? - Personal and Sensitive Information - Considerations for Using the Data - Social Impact of Dataset - Discussion of Biases - Other Known Limitations - Additional Information - Dataset Curators - Licensing Information - Citation Information - ContributionsDataset Description
- Repository: https://github.com/google-research/google-research/tree/master/mbpp
- Paper: Program Synthesis with Large Language Models
Dataset Summary
The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on. Each problem consists of a task description, code solution and 3 automated test cases. As described in the paper, a subset of the data has been hand-verified by us.Released here as part of Program Synthesis with Large Language Models, Austin et. al., 2021.
Supported Tasks and Leaderboards
This dataset is used to evaluate code generations.Languages
English - Python codeDataset Structure
dataset_full = load_dataset("mbpp")
DatasetDict({
test: Dataset({
features: ['task_id', 'text', 'code', 'test_list', 'test_setup_code', 'challenge_test_list'],
num_rows: 974
})
})dataset_sanitized = load_dataset("mbpp", "sanitized")
DatasetDict({
test: Dataset({
features: ['source_file', 'task_id', 'prompt', 'code', 'test_imports', 'test_list'],
num_rows: 427
})
})
Data Instances
#### mbpp - full
{
'task_id': 1,
'text': 'Write a function to find the minimum cost path to reach (m, n) from (0, 0) for the given cost matrix cost[][] and a position (m, n) in cost[][].',
'code': 'R = 3\r\nC = 3\r\ndef min_cost(cost, m, n): \r\n\ttc = [[0 for x in range(C)] for x in range(R)] \r\n\ttc[0][0] = cost[0][0] \r\n\tfor i in range(1, m+1): \r\n\t\ttc[i][0] = tc[i-1][0] + cost[i][0] \r\n\tfor j in range(1, n+1): \r\n\t\ttc[0][j] = tc[0][j-1] + cost[0][j] \r\n\tfor i in range(1, m+1): \r\n\t\tfor j in range(1, n+1): \r\n\t\t\ttc[i][j] = min(tc[i-1][j-1], tc[i-1][j], tc[i][j-1]) + cost[i][j] \r\n\treturn tc[m][n]',
'test_list': [
'assert min_cost([[1, 2, 3], [4, 8, 2], [1, 5, 3]], 2, 2) == 8',
'assert min_cost([[2, 3, 4], [5, 9, 3], [2, 6, 4]], 2, 2) == 12',
'assert min_cost([[3, 4, 5], [6, 10, 4], [3, 7, 5]], 2, 2) == 16'],
'test_setup_code': '',
'challenge_test_list': []
}
{
'source_file': 'Benchmark Questions Verification V2.ipynb',
'task_id': 2,
'prompt': 'Write a function to find the shared elements from the given two lists.',
'code': 'def similar_elements(test_tup1, test_tup2):\n res = tuple(set(test_tup1) & set(test_tup2))\n return (res) ',
'test_imports': [],
'test_list': [
'assert set(similar_elements((3, 4, 5, 6),(5, 7, 4, 10))) == set((4, 5))',
'assert set(similar_elements((1, 2, 3, 4),(5, 4, 3, 7))) == set((3, 4))',
'assert set(similar_elements((11, 12, 14, 13),(17, 15, 14, 13))) == set((13, 14))'
]
}
Data Fields
source_file: unknowntext/prompt: description of programming taskcode: solution for programming tasktest_setup_code/test_imports: necessary code imports to execute teststest_list: list of tests to verify solutionchallenge_test_list: list of more challenging test to further probe solution
Data Splits
There are two version of the dataset (full and sanitized), each with four splits:- train
- evaluation
- test
- prompt
prompt split corresponds to samples used for few-shot prompting and not for training.Dataset Creation
See section 2.1 of original paper.Curation Rationale
In order to evaluate code generation functions a set of simple programming tasks as well as solutions is necessary which this dataset provides.Source Data
#### Initial Data Collection and Normalization The dataset was manually created from scratch.
#### Who are the source language producers? The dataset was created with an internal crowdsourcing effort at Google.
Annotations
#### Annotation process The full dataset was created first and a subset then underwent a second round to improve the task descriptions.
#### Who are the annotators? The dataset was created with an internal crowdsourcing effort at Google.
Personal and Sensitive Information
None.Considerations for Using the Data
Make sure you execute generated Python code in a safe environment when evauating against this dataset as generated code could be harmful.Social Impact of Dataset
With this dataset code generating models can be better evaluated which leads to fewer issues introduced when using such models.Discussion of Biases
Other Known Limitations
Since the task descriptions might not be expressive enough to solve the task. Thesanitized split aims at addressing this issue by having a second round of annotators improve the dataset.Additional Information
Dataset Curators
Google ResearchLicensing Information
CC-BY-4.0Citation Information
@article{austin2021program,
title={Program Synthesis with Large Language Models},
author={Austin, Jacob and Odena, Augustus and Nye, Maxwell and Bosma, Maarten and Michalewski, Henryk and Dohan, David and Jiang, Ellen and Cai, Carrie and Terry, Michael and Le, Quoc and others},
journal={arXiv preprint arXiv:2108.07732},
year={2021}
Contributions
Thanks to @lvwerra for adding this dataset.7,104 characters total