Hotpot Qa
| Entity Passport | |
| Registry ID | hf-dataset--hotpotqa--hotpot_qa |
| License | ["cc-by-sa-4.0"] |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__hotpotqa__hotpot_qa,
author = {hotpotqa},
title = {Hotpot Qa Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/hotpotqa/hotpot_qa}},
note = {Accessed via Free2AITools Knowledge Fortress}
} π¬Technical Deep Dive
Full Specifications [+]βΎ
βοΈ Nexus Index V2.0
π¬ Index Insight
FNI V2.0 for Hotpot Qa: Semantic (S:50), Authority (A:0), Popularity (P:58), Recency (R:27), Quality (Q:30).
Verification Authority
ποΈ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
π Explore Full Dataset β𧬠Field Logic
Schema not yet indexed for this dataset.
Dataset Specification
Dataset Card for "hotpot_qa"
Table of Contents
- Dataset Description
- Dataset Structure
- Dataset Creation
- Considerations for Using the Data
- Additional Information
Dataset Description
- Homepage: https://hotpotqa.github.io/
- Repository: https://github.com/hotpotqa/hotpot
- Paper: HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
- Point of Contact: More Information Needed
- Size of downloaded dataset files: 1.27 GB
- Size of the generated dataset: 1.24 GB
- Total amount of disk used: 2.52 GB
Dataset Summary
HotpotQA is a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowingQA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systemsβ ability to extract relevant facts and perform necessary comparison.
Supported Tasks and Leaderboards
Languages
Dataset Structure
Data Instances
distractor
- Size of downloaded dataset files: 612.75 MB
- Size of the generated dataset: 598.66 MB
- Total amount of disk used: 1.21 GB
An example of 'validation' looks as follows.
{
"answer": "This is the answer",
"context": {
"sentences": [["Sent 1"], ["Sent 21", "Sent 22"]],
"title": ["Title1", "Title 2"]
},
"id": "000001",
"level": "medium",
"question": "What is the answer?",
"supporting_facts": {
"sent_id": [0, 1, 3],
"title": ["Title of para 1", "Title of para 2", "Title of para 3"]
},
"type": "comparison"
}
fullwiki
- Size of downloaded dataset files: 660.10 MB
- Size of the generated dataset: 645.80 MB
- Total amount of disk used: 1.31 GB
An example of 'train' looks as follows.
{
"answer": "This is the answer",
"context": {
"sentences": [["Sent 1"], ["Sent 2"]],
"title": ["Title1", "Title 2"]
},
"id": "000001",
"level": "hard",
"question": "What is the answer?",
"supporting_facts": {
"sent_id": [0, 1, 3],
"title": ["Title of para 1", "Title of para 2", "Title of para 3"]
},
"type": "bridge"
}
Data Fields
The data fields are the same among all splits.
distractor
id: astringfeature.question: astringfeature.answer: astringfeature.type: astringfeature.level: astringfeature.supporting_facts: a dictionary feature containing:title: astringfeature.sent_id: aint32feature.
context: a dictionary feature containing:title: astringfeature.sentences: alistofstringfeatures.
fullwiki
id: astringfeature.question: astringfeature.answer: astringfeature.type: astringfeature.level: astringfeature.supporting_facts: a dictionary feature containing:title: astringfeature.sent_id: aint32feature.
context: a dictionary feature containing:title: astringfeature.sentences: alistofstringfeatures.
Data Splits
distractor
| train | validation | |
|---|---|---|
| distractor | 90447 | 7405 |
fullwiki
| train | validation | test | |
|---|---|---|---|
| fullwiki | 90447 | 7405 | 7405 |
Dataset Creation
Curation Rationale
Source Data
Initial Data Collection and Normalization
Who are the source language producers?
Annotations
Annotation process
Who are the annotators?
Personal and Sensitive Information
Considerations for Using the Data
Social Impact of Dataset
Discussion of Biases
Other Known Limitations
Additional Information
Dataset Curators
Licensing Information
HotpotQA is distributed under a CC BY-SA 4.0 License.
Citation Information
@inproceedings{yang2018hotpotqa,
title={{HotpotQA}: A Dataset for Diverse, Explainable Multi-hop Question Answering},
author={Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D.},
booktitle={Conference on Empirical Methods in Natural Language Processing ({EMNLP})},
year={2018}
}
Contributions
Thanks to @albertvillanova, @ghomasHudson for adding this dataset.
π Structured Schema (Zero-Fabrication)
| Feature Key | Data Type |
|---|---|
id |
string |
question |
string |
answer |
string |
type |
string |
level |
string |
supporting_facts |
unknown |
context |
unknown |
Estimated Rows: 97,852
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
π‘οΈ Dataset Transparency Report
Technical metadata sourced from upstream repositories.
π Identity & Source
- id
- hf-dataset--hotpotqa--hotpot_qa
- slug
- hotpotqa--hotpot_qa
- source
- huggingface
- author
- hotpotqa
- license
- ["cc-by-sa-4.0"]
- tags
- task_categories:question-answering, annotations_creators:crowdsourced, language_creators:found, multilinguality:monolingual, source_datasets:original, language:en, license:cc-by-sa-4.0, size_categories:100k<n<1m, format:parquet, modality:text, library:datasets, library:dask, library:polars, library:mlcroissant, arxiv:1809.09600, region:us, multi-hop
βοΈ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
- pipeline tag
π Engagement & Metrics
- downloads
- 80,334
- stars
- 285
- forks
- 0
Data indexed from public sources. Updated daily.