📊

Dataset

Hotpot Qa

Name: Hotpot Qa
Creator: hotpotqa
License: ["cc-by-sa-4.0"]

by hotpotqa hf-dataset--hotpotqa--hotpot_qa

Nexus Index

33.4 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 58

R: Recency 27

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 33.4 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--hotpotqa--hotpot_qa
License	["cc-by-sa-4.0"]
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__hotpotqa__hotpot_qa,
  author = {hotpotqa},
  title = {Hotpot Qa Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/hotpotqa/hotpot_qa}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

hotpotqa. (2026). Hotpot Qa [Dataset]. Free2AITools. https://huggingface.co/datasets/hotpotqa/hotpot_qa

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

33.4

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 58

Recency (R) 27

Quality (Q) 30

💬 Index Insight

FNI V2.0 for Hotpot Qa: Semantic (S:50), Authority (A:0), Popularity (P:58), Recency (R:27), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

80,334

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

Dataset Card for "hotpot_qa"

Dataset Description
Dataset Structure
Dataset Creation
Considerations for Using the Data
Additional Information

Dataset Description

Homepage: https://hotpotqa.github.io/
Repository: https://github.com/hotpotqa/hotpot
Paper: HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Point of Contact: More Information Needed
Size of downloaded dataset files: 1.27 GB
Size of the generated dataset: 1.24 GB
Total amount of disk used: 2.52 GB

Dataset Summary

HotpotQA is a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowingQA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure

Data Instances

distractor

Size of downloaded dataset files: 612.75 MB
Size of the generated dataset: 598.66 MB
Total amount of disk used: 1.21 GB

An example of 'validation' looks as follows.

text

{
    "answer": "This is the answer",
    "context": {
        "sentences": [["Sent 1"], ["Sent 21", "Sent 22"]],
        "title": ["Title1", "Title 2"]
    },
    "id": "000001",
    "level": "medium",
    "question": "What is the answer?",
    "supporting_facts": {
        "sent_id": [0, 1, 3],
        "title": ["Title of para 1", "Title of para 2", "Title of para 3"]
    },
    "type": "comparison"
}

fullwiki

Size of downloaded dataset files: 660.10 MB
Size of the generated dataset: 645.80 MB
Total amount of disk used: 1.31 GB

An example of 'train' looks as follows.

text

{
    "answer": "This is the answer",
    "context": {
        "sentences": [["Sent 1"], ["Sent 2"]],
        "title": ["Title1", "Title 2"]
    },
    "id": "000001",
    "level": "hard",
    "question": "What is the answer?",
    "supporting_facts": {
        "sent_id": [0, 1, 3],
        "title": ["Title of para 1", "Title of para 2", "Title of para 3"]
    },
    "type": "bridge"
}

Data Fields

The data fields are the same among all splits.

distractor

id: a string feature.
question: a string feature.
answer: a string feature.
type: a string feature.
level: a string feature.
supporting_facts: a dictionary feature containing:
- title: a string feature.
- sent_id: a int32 feature.
context: a dictionary feature containing:
- title: a string feature.
- sentences: a list of string features.

fullwiki

id: a string feature.
question: a string feature.
answer: a string feature.
type: a string feature.
level: a string feature.
supporting_facts: a dictionary feature containing:
- title: a string feature.
- sent_id: a int32 feature.
context: a dictionary feature containing:
- title: a string feature.
- sentences: a list of string features.

Data Splits

distractor

	train	validation
distractor	90447	7405

fullwiki

	train	validation	test
fullwiki	90447	7405	7405

Dataset Creation

Curation Rationale

More Information Needed

Source Data

Initial Data Collection and Normalization

More Information Needed

Who are the source language producers?

More Information Needed

Annotations

Considerations for Using the Data

More Information Needed

Discussion of Biases

More Information Needed

Other Known Limitations

More Information Needed

Additional Information

Dataset Curators

More Information Needed

Licensing Information

HotpotQA is distributed under a CC BY-SA 4.0 License.

Citation Information

text

@inproceedings{yang2018hotpotqa,
  title={{HotpotQA}: A Dataset for Diverse, Explainable Multi-hop Question Answering},
  author={Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D.},
  booktitle={Conference on Empirical Methods in Natural Language Processing ({EMNLP})},
  year={2018}
}

Contributions

Thanks to @albertvillanova, @ghomasHudson for adding this dataset.

📊 Structured Schema (Zero-Fabrication)

Feature Key	Data Type
`id`	`string`
`question`	`string`
`answer`	`string`
`type`	`string`
`level`	`string`
`supporting_facts`	`unknown`
`context`	`unknown`

Estimated Rows: 97,852

Social Proof

HuggingFace Hub

80.3KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--hotpotqa--hotpot_qa
slug: hotpotqa--hotpot_qa
source: huggingface
author: hotpotqa
license: ["cc-by-sa-4.0"]
tags: task_categories:question-answering, annotations_creators:crowdsourced, language_creators:found, multilinguality:monolingual, source_datasets:original, language:en, license:cc-by-sa-4.0, size_categories:100k<n<1m, format:parquet, modality:text, library:datasets, library:dask, library:polars, library:mlcroissant, arxiv:1809.09600, region:us, multi-hop

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 80,334
stars: 285
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!

Cite this dataset

🔬Technical Deep Dive

⚖️ Nexus Index V2.0

💬 Index Insight

Verification Authority

👁️ Data Preview

🧬 Field Logic

Dataset Specification

Dataset Card for "hotpot_qa"

Table of Contents

Dataset Description

Dataset Summary

Supported Tasks and Leaderboards

Languages

Dataset Structure

Data Instances

distractor

fullwiki

Data Fields

distractor

fullwiki

Data Splits

distractor

fullwiki

Dataset Creation

Curation Rationale

Source Data

Initial Data Collection and Normalization

Who are the source language producers?

Annotations

Annotation process

Who are the annotators?

Personal and Sensitive Information

Considerations for Using the Data

Social Impact of Dataset

Discussion of Biases

Other Known Limitations

Additional Information

Dataset Curators

Licensing Information

Citation Information

Contributions

📊 Structured Schema (Zero-Fabrication)

Social Proof

🛡️ Dataset Transparency Report

🆔 Identity & Source

⚙️ Technical Specs

📊 Engagement & Metrics