This is a Dataset, not a Model
The following metrics do not apply: FNI Score, Deployment Options, Model Architecture
xquad
"--- annotations_creators: - expert-generated language_creators: - expert-generated language: - ar - de - el - en - es - hi - ro - ru - th - tr - vi - zh license: - cc-by-sa-4.0 multilinguality: - multilingual size_categories: - unknown source_datasets: - extended|squad task_categories: - question-an..."
Best Scenarios
Technical Constraints
๐ธ๏ธ Neural Graph Explorer
v15.13๐ Interest Trend
* Real-time activity index across HuggingFace, GitHub and Research citations.
Capabilities
- โ Data Science
Finding datasets with similar distribution...
No benchmark correlations for this dataset.
๐ฌDeep Dive
Expand Details [+]โพ
๐ ๏ธ Technical Profile
โก Hardware & Scale
๐ง Training & Env
๐ Cloud & Rights
๐๏ธ Data Preview
| feature | label | split |
|---|---|---|
| example_text_1 | 0 | train |
| example_text_2 | 1 | train |
| example_text_3 | 0 | test |
| example_text_4 | 1 | validation |
| example_text_5 | 0 | train |
๐งฌ Schema & Configs
Fields
Dataset Card
Dataset Card for "xquad"
Table of Contents
- Dataset Summary - Supported Tasks and Leaderboards - Languages - Data Instances - Data Fields - Data Splits - Curation Rationale - Source Data - Annotations - Personal and Sensitive Information - Social Impact of Dataset - Discussion of Biases - Other Known Limitations - Dataset Curators - Licensing Information - Citation Information - ContributionsDataset Description
- Homepage: https://github.com/deepmind/xquad
- Repository: More Information Needed
- Paper: More Information Needed
- Point of Contact: More Information Needed
- Size of downloaded dataset files: 146.31 MB
- Size of the generated dataset: 18.97 MB
- Total amount of disk used: 165.28 MB
Dataset Summary
XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German, Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Consequently, the dataset is entirely parallel across 11 languages.
Supported Tasks and Leaderboards
Languages
Dataset Structure
Data Instances
#### xquad.ar
- Size of downloaded dataset files: 13.30 MB
- Size of the generated dataset: 1.72 MB
- Total amount of disk used: 15.03 MB
{ "answers": { "answer_start": [527], "text": ["136"] }
Dataset Card for "xquad"
Table of Contents
- Dataset Summary - Supported Tasks and Leaderboards - Languages - Data Instances - Data Fields - Data Splits - Curation Rationale - Source Data - Annotations - Personal and Sensitive Information - Social Impact of Dataset - Discussion of Biases - Other Known Limitations - Dataset Curators - Licensing Information - Citation Information - ContributionsDataset Description
- Homepage: https://github.com/deepmind/xquad
- Repository: More Information Needed
- Paper: More Information Needed
- Point of Contact: More Information Needed
- Size of downloaded dataset files: 146.31 MB
- Size of the generated dataset: 18.97 MB
- Total amount of disk used: 165.28 MB
Dataset Summary
XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German, Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Consequently, the dataset is entirely parallel across 11 languages.
Supported Tasks and Leaderboards
Languages
Dataset Structure
Data Instances
#### xquad.ar
- Size of downloaded dataset files: 13.30 MB
- Size of the generated dataset: 1.72 MB
- Total amount of disk used: 15.03 MB
This example was too long and was cropped:{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, wรคhrend sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
#### xquad.de
- Size of downloaded dataset files: 13.30 MB
- Size of the generated dataset: 1.29 MB
- Total amount of disk used: 14.59 MB
This example was too long and was cropped:{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, wรคhrend sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
#### xquad.el
- Size of downloaded dataset files: 13.30 MB
- Size of the generated dataset: 2.21 MB
- Total amount of disk used: 15.51 MB
This example was too long and was cropped:{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, wรคhrend sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
#### xquad.en
- Size of downloaded dataset files: 13.30 MB
- Size of the generated dataset: 1.12 MB
- Total amount of disk used: 14.42 MB
This example was too long and was cropped:{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, wรคhrend sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
#### xquad.es
- Size of downloaded dataset files: 13.30 MB
- Size of the generated dataset: 1.28 MB
- Total amount of disk used: 14.58 MB
This example was too long and was cropped:{
"answers": {
"answer_start": [527],
"text": ["136"]
},
"context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, wรคhrend sie die NFL mit 24 Inte...",
"id": "56beb4343aeaaa14008c925c",
"question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?"
}
Data Fields
The data fields are the same among all splits.
#### xquad.ar
id: astringfeature.context: astringfeature.question: astringfeature.answers: a dictionary feature containing:
text: a string feature.
- answer_start: a int32 feature.#### xquad.de
id: astringfeature.context: astringfeature.question: astringfeature.answers: a dictionary feature containing:
text: a string feature.
- answer_start: a int32 feature.#### xquad.el
id: astringfeature.context: astringfeature.question: astringfeature.answers: a dictionary feature containing:
text: a string feature.
- answer_start: a int32 feature.#### xquad.en
id: astringfeature.context: astringfeature.question: astringfeature.answers: a dictionary feature containing:
text: a string feature.
- answer_start: a int32 feature.#### xquad.es
id: astringfeature.context: astringfeature.question: astringfeature.answers: a dictionary feature containing:
text: a string feature.
- answer_start: a int32 feature.Data Splits
| name | validation | | -------- | ---------: | | xquad.ar | 1190 | | xquad.de | 1190 | | xquad.el | 1190 | | xquad.en | 1190 | | xquad.es | 1190 |
Dataset Creation
Curation Rationale
Source Data
#### Initial Data Collection and Normalization
#### Who are the source language producers?
Annotations
#### Annotation process
#### Who are the annotators?
Personal and Sensitive Information
Considerations for Using the Data
Social Impact of Dataset
Discussion of Biases
Other Known Limitations
Additional Information
Dataset Curators
Licensing Information
Citation Information
@article{Artetxe:etal:2019,
author = {Mikel Artetxe and Sebastian Ruder and Dani Yogatama},
title = {On the cross-lingual transferability of monolingual representations},
journal = {CoRR},
volume = {abs/1910.11856},
year = {2019},
archivePrefix = {arXiv},
eprint = {1910.11856}
}
Contributions
Thanks to @lewtun, @patrickvonplaten, @thomwolf for adding this dataset.
9,581 characters total