⚠️

This is a Dataset, not a Model

The following metrics do not apply: FNI Score, Deployment Options, Model Architecture

📊

xquad

FNI 22.4

by google Dataset

"--- annotations_creators: - expert-generated language_creators: - expert-generated language: - ar - de - el - en - es - hi - ro - ru - th - tr - vi - zh license: - cc-by-sa-4.0 multilinguality: - multilingual size_categories: - unknown source_datasets: - extended|squad task_categories: - question-an..."

Download Dataset

Best Scenarios

✨ Data Science

Technical Constraints

Generic Use

- Size

- Rows

Parquet Format

38 Likes

Graph Overview

263 Entities

273 Connections

Explore Full Graph →

📈 Interest Trend

--

* Real-time activity index across HuggingFace, GitHub and Research citations.

Capabilities

✅ Data Science

🔬Deep Dive

Expand Details [+]

🛠️ Technical Profile

⚡ Hardware & Scale

Size

-

Total Rows

-

Files

14

🧠 Training & Env

Format

Parquet

Cleaning

Raw

🌐 Cloud & Rights

Source

huggingface

License

["cc-by-sa-4.0"]

👁️ Data Preview

feature	label	split
example_text_1	0	train
example_text_2	1	train
example_text_3	0	test
example_text_4	1	validation
example_text_5	0	train

Showing 5 sample rows. Real-time preview requires login.

🧬 Schema & Configs

Fields

feature: string

label: int64

split: string

Dataset Card

Dataset Card for "xquad"

Table of Contents

Dataset Description

- Dataset Summary - Supported Tasks and Leaderboards - Languages

Dataset Structure

- Data Instances - Data Fields - Data Splits

Dataset Creation

- Curation Rationale - Source Data - Annotations - Personal and Sensitive Information

Considerations for Using the Data

- Social Impact of Dataset - Discussion of Biases - Other Known Limitations

Additional Information

- Dataset Curators - Licensing Information - Citation Information - Contributions

Dataset Description

Homepage: https://github.com/deepmind/xquad
Repository: More Information Needed
Paper: More Information Needed
Point of Contact: More Information Needed
Size of downloaded dataset files: 146.31 MB
Size of the generated dataset: 18.97 MB
Total amount of disk used: 165.28 MB

Dataset Summary

XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German, Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Consequently, the dataset is entirely parallel across 11 languages.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure

Data Instances

#### xquad.ar

Size of downloaded dataset files: 13.30 MB
Size of the generated dataset: 1.72 MB
Total amount of disk used: 15.03 MB

An example of 'validation' looks as follows. ``` This example was too long and was cropped:

{ "answers": { "answer_start": [527], "text": ["136"] }

Dataset Card for "xquad"

Table of Contents

Dataset Description

- Dataset Summary - Supported Tasks and Leaderboards - Languages

Dataset Structure

- Data Instances - Data Fields - Data Splits

Dataset Creation

- Curation Rationale - Source Data - Annotations - Personal and Sensitive Information

Considerations for Using the Data

- Social Impact of Dataset - Discussion of Biases - Other Known Limitations

Additional Information

- Dataset Curators - Licensing Information - Citation Information - Contributions

Dataset Description

Homepage: https://github.com/deepmind/xquad
Repository: More Information Needed
Paper: More Information Needed
Point of Contact: More Information Needed
Size of downloaded dataset files: 146.31 MB
Size of the generated dataset: 18.97 MB
Total amount of disk used: 165.28 MB

Dataset Summary

XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German, Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Consequently, the dataset is entirely parallel across 11 languages.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure

Data Instances

#### xquad.ar

Size of downloaded dataset files: 13.30 MB
Size of the generated dataset: 1.72 MB
Total amount of disk used: 15.03 MB

An example of 'validation' looks as follows.

code

This example was too long and was cropped:

{ "answers": { "answer_start": [527], "text": ["136"] }, "context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...", "id": "56beb4343aeaaa14008c925c", "question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?" }

#### xquad.de

Size of downloaded dataset files: 13.30 MB
Size of the generated dataset: 1.29 MB
Total amount of disk used: 14.59 MB

An example of 'validation' looks as follows.

code

This example was too long and was cropped:

{ "answers": { "answer_start": [527], "text": ["136"] }, "context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...", "id": "56beb4343aeaaa14008c925c", "question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?" }

#### xquad.el

Size of downloaded dataset files: 13.30 MB
Size of the generated dataset: 2.21 MB
Total amount of disk used: 15.51 MB

An example of 'validation' looks as follows.

code

This example was too long and was cropped:

{ "answers": { "answer_start": [527], "text": ["136"] }, "context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...", "id": "56beb4343aeaaa14008c925c", "question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?" }

#### xquad.en

Size of downloaded dataset files: 13.30 MB
Size of the generated dataset: 1.12 MB
Total amount of disk used: 14.42 MB

An example of 'validation' looks as follows.

code

This example was too long and was cropped:

{ "answers": { "answer_start": [527], "text": ["136"] }, "context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...", "id": "56beb4343aeaaa14008c925c", "question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?" }

#### xquad.es

Size of downloaded dataset files: 13.30 MB
Size of the generated dataset: 1.28 MB
Total amount of disk used: 14.58 MB

An example of 'validation' looks as follows.

code

This example was too long and was cropped:

{ "answers": { "answer_start": [527], "text": ["136"] }, "context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...", "id": "56beb4343aeaaa14008c925c", "question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?" }

Data Fields

The data fields are the same among all splits.

#### xquad.ar

id: a string feature.
context: a string feature.
question: a string feature.
answers: a dictionary feature containing:

- text: a string feature. - answer_start: a int32 feature.

#### xquad.de

id: a string feature.
context: a string feature.
question: a string feature.
answers: a dictionary feature containing:

- text: a string feature. - answer_start: a int32 feature.

#### xquad.el

id: a string feature.
context: a string feature.
question: a string feature.
answers: a dictionary feature containing:

- text: a string feature. - answer_start: a int32 feature.

#### xquad.en

id: a string feature.
context: a string feature.
question: a string feature.
answers: a dictionary feature containing:

- text: a string feature. - answer_start: a int32 feature.

#### xquad.es

id: a string feature.
context: a string feature.
question: a string feature.
answers: a dictionary feature containing:

- text: a string feature. - answer_start: a int32 feature.

Data Splits

| name | validation | | -------- | ---------: | | xquad.ar | 1190 | | xquad.de | 1190 | | xquad.el | 1190 | | xquad.en | 1190 | | xquad.es | 1190 |

Dataset Creation

Curation Rationale

More Information Needed

Source Data

#### Initial Data Collection and Normalization

More Information Needed

#### Who are the source language producers?

More Information Needed

Annotations

#### Annotation process

More Information Needed

#### Who are the annotators?

More Information Needed

Personal and Sensitive Information

More Information Needed

Considerations for Using the Data

Social Impact of Dataset

More Information Needed

Discussion of Biases

More Information Needed

Other Known Limitations

More Information Needed

Additional Information

Dataset Curators

More Information Needed

Licensing Information

More Information Needed

Citation Information

code

@article{Artetxe:etal:2019,
      author    = {Mikel Artetxe and Sebastian Ruder and Dani Yogatama},
      title     = {On the cross-lingual transferability of monolingual representations},
      journal   = {CoRR},
      volume    = {abs/1910.11856},
      year      = {2019},
      archivePrefix = {arXiv},
      eprint    = {1910.11856}
}

Contributions

Thanks to @lewtun, @patrickvonplaten, @thomwolf for adding this dataset.

9,581 characters total