📊

Dataset

Roberta Base Squad2

by deepset hf-dataset--huggingface--deepset--roberta-base-squad2

Nexus Index

30.0 Top 1%

P / V / C / U Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context

Vital Performance

0 DL / 30D

0.0%

--- language: en datasets: - squad_v2 model-index: - name: deepset/roberta-base-squad2 results: - task: type: question-answering name: Question Answering dataset: name: squad_v2 type: squad_v2 config: squad_v2 split: validation metrics: - type: exact_match value: 79.9309 name: Exact Match verified: true verifyToken: >- eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMDhhNjg5YzNiZGQ1YTIyYTAwZGUwOWEzZTRiYzdjM2QzYjA3ZTUxNDM1NjE1MTUyMjE1MGY1YzEzMjRjYzVjYiIsInZlcnNpb24iOjF9.EH5...

Source →

Data Integrity 30 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--huggingface--deepset--roberta-base-squad2
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__huggingface__deepset__roberta_base_squad2,
  author = {deepset},
  title = {Roberta Base Squad2 Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/deepset/roberta-base-squad2}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

deepset. (2026). Roberta Base Squad2 [Dataset]. Free2AITools. https://huggingface.co/deepset/roberta-base-squad2

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V16.5

Methodology Index Protocol

30.0

ESTIMATED IMPACT TIER

Popularity (P) 0

Freshness (F) 0

Completeness (C) 0

Utility (U) 0

💬 Index Insight

The Free2AITools Nexus Index for Roberta Base Squad2 aggregates Popularity (P:0), Freshness (F:0), and Completeness (C:0). The Utility score (U:0) represents deployment readiness and ecosystem adoption.

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

730,222

❤️

Likes

930

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

language: en
license: cc-by-4.0
datasets:

squad_v2
model-index:
name: deepset/roberta-base-squad2
results:
- task:
  type: question-answering
  name: Question Answering
  dataset:
  name: squad_v2
  type: squad_v2
  config: squad_v2
  split: validation
  metrics:
  - type: exact_match
    value: 79.9309
    name: Exact Match
    verified: true
    verifyToken: >-
    eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMDhhNjg5YzNiZGQ1YTIyYTAwZGUwOWEzZTRiYzdjM2QzYjA3ZTUxNDM1NjE1MTUyMjE1MGY1YzEzMjRjYzVjYiIsInZlcnNpb24iOjF9.EH5JJo8EEFwU7osPz3s7qanw_tigeCFhCXjSfyN0Y1nWVnSfulSxIk_DbAEI5iE80V4EKLyp5-mYFodWvL2KDA
  - type: f1
    value: 82.9501
    name: F1
    verified: true
    verifyToken: >-
    eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjk5ZDYwOGQyNjNkMWI0OTE4YzRmOTlkY2JjNjQ0YTZkNTMzMzNkYTA0MDFmNmI3NjA3NjNlMjhiMDQ2ZjJjNSIsInZlcnNpb24iOjF9.DDm0LNTkdLbGsue58bg1aH_s67KfbcmkvL-6ZiI2s8IoxhHJMSf29H_uV2YLyevwx900t-MwTVOW3qfFnMMEAQ
  - type: total
    value: 11869
    name: total
    verified: true
    verifyToken: >-
    eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMGFkMmI2ODM0NmY5NGNkNmUxYWViOWYxZDNkY2EzYWFmOWI4N2VhYzY5MGEzMTVhOTU4Zjc4YWViOGNjOWJjMCIsInZlcnNpb24iOjF9.fexrU1icJK5_MiifBtZWkeUvpmFISqBLDXSQJ8E6UnrRof-7cU0s4tX_dIsauHWtUpIHMPZCf5dlMWQKXZuAAA
- task:
  type: question-answering
  name: Question Answering
  dataset:
  name: squad
  type: squad
  config: plain_text
  split: validation
  metrics:
  - type: exact_match
    value: 85.289
    name: Exact Match
  - type: f1
    value: 91.841
    name: F1
- task:
  type: question-answering
  name: Question Answering
  dataset:
  name: adversarial_qa
  type: adversarial_qa
  config: adversarialQA
  split: validation
  metrics:
  - type: exact_match
    value: 29.5
    name: Exact Match
  - type: f1
    value: 40.367
    name: F1
- task:
  type: question-answering
  name: Question Answering
  dataset:
  name: squad_adversarial
  type: squad_adversarial
  config: AddOneSent
  split: validation
  metrics:
  - type: exact_match
    value: 78.567
    name: Exact Match
  - type: f1
    value: 84.469
    name: F1
- task:
  type: question-answering
  name: Question Answering
  dataset:
  name: squadshifts amazon
  type: squadshifts
  config: amazon
  split: test
  metrics:
  - type: exact_match
    value: 69.924
    name: Exact Match
  - type: f1
    value: 83.284
    name: F1
- task:
  type: question-answering
  name: Question Answering
  dataset:
  name: squadshifts new_wiki
  type: squadshifts
  config: new_wiki
  split: test
  metrics:
  - type: exact_match
    value: 81.204
    name: Exact Match
  - type: f1
    value: 90.595
    name: F1
- task:
  type: question-answering
  name: Question Answering
  dataset:
  name: squadshifts nyt
  type: squadshifts
  config: nyt
  split: test
  metrics:
  - type: exact_match
    value: 82.931
    name: Exact Match
  - type: f1
    value: 90.756
    name: F1
- task:
  type: question-answering
  name: Question Answering
  dataset:
  name: squadshifts reddit
  type: squadshifts
  config: reddit
  split: test
  metrics:
  - type: exact_match
    value: 71.55
    name: Exact Match
  - type: f1
    value: 82.939
    name: F1

base_model:

FacebookAI/roberta-base

roberta-base for Extractive QA

This is the roberta-base model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Extractive Question Answering.
We have also released a distilled version of this model called deepset/tinyroberta-squad2. It has a comparable prediction quality and runs at twice the speed of deepset/roberta-base-squad2.

Overview

Language model: roberta-base
Language: English
Downstream-task: Extractive QA
Training data: SQuAD 2.0
Eval data: SQuAD 2.0
Code: See an example extractive QA pipeline built with Haystack
Infrastructure: 4x Tesla v100

Hyperparameters

batch_size = 96
n_epochs = 2
base_LM_model = "roberta-base"
max_seq_len = 386
learning_rate = 3e-5
lr_schedule = LinearWarmup
warmup_proportion = 0.2
doc_stride=128
max_query_length=64

Usage

In Haystack

Haystack is an AI orchestration framework to build customizable, production-ready LLM applications. You can use this model in Haystack to do extractive question answering on documents.
To load and run the model with Haystack:

# After running pip install haystack-ai "transformers[torch,sentencepiece]"

from haystack import Document
from haystack.components.readers import ExtractiveReader
docs = [
    Document(content="Python is a popular programming language"),
    Document(content="python ist eine beliebte Programmiersprache"),
]
reader = ExtractiveReader(model="deepset/roberta-base-squad2")
reader.warm_up()
question = "What is a popular programming language?"
result = reader.run(query=question, documents=docs)
{'answers': [ExtractedAnswer(query='What is a popular programming language?', score=0.5740374326705933, data='python', document=Document(id=..., content: '...'), context=None, document_offset=ExtractedAnswer.Span(start=0, end=6),...)]}

For a complete example with an extractive question answering pipeline that scales over many documents, check out the corresponding Haystack tutorial.

In Transformers

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "deepset/roberta-base-squad2"
a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Why is model conversion important?',
    'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
res = nlp(QA_input)
b) Load model & tokenizermodel = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Performance

Evaluated on the SQuAD 2.0 dev set with the official eval script.

"exact": 79.87029394424324, "f1": 82.91251169582613,

"total": 11873, "HasAns_exact": 77.93522267206478, "HasAns_f1": 84.02838248389763, "HasAns_total": 5928, "NoAns_exact": 81.79983179142137, "NoAns_f1": 81.79983179142137, "NoAns_total": 5945

Authors

Branden Chan: [email protected]
Timo Möller: [email protected]
Malte Pietsch: [email protected]
Tanay Soni: [email protected]

About us

deepset is the company behind the production-ready open-source AI framework Haystack.

Some of our other work:

Get in touch and join the Haystack community

For more info on Haystack, visit our GitHub repo and Documentation.

We also have a Discord community open to everyone!

By the way: we're hiring!

Top Tier

Social Proof

HuggingFace Hub

930Likes

730.2KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id: hf-dataset--huggingface--deepset--roberta-base-squad2
source: huggingface
author: deepset
tags: transformerspytorchtfjaxrustsafetensorsrobertaquestion-answeringendataset:squad_v2base_model:facebookai/roberta-basebase_model:finetune:facebookai/roberta-baselicense:cc-by-4.0model-indexendpoints_compatibledeploy:azureregion:us

⚙️ Technical Specs

architecture: roberta
params billions: 124,057,092
context length: 4,096
pipeline tag: question-answering

📊 Engagement & Metrics

likes: 930
downloads: 730,222

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!