📊

Dataset

Llm Election Data 2024

Name: Llm Election Data 2024
Creator: sarahcen
License: MIT

by sarahcen hf-dataset--sarahcen--llm-election-data-2024

Nexus Index

29.3 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 55

R: Recency 50

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 29.3 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--sarahcen--llm-election-data-2024
License	MIT
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__sarahcen__llm_election_data_2024,
  author = {sarahcen},
  title = {Llm Election Data 2024 Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/sarahcen/llm-election-data-2024}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

sarahcen. (2026). Llm Election Data 2024 [Dataset]. Free2AITools. https://huggingface.co/datasets/sarahcen/llm-election-data-2024

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

29.3

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 55

Recency (R) 50

Quality (Q) 30

💬 Index Insight

FNI V2.0 for Llm Election Data 2024: Semantic (S:50), Authority (A:0), Popularity (P:55), Recency (R:50), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

63,542

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

Data Release for Large-Scale, Longitudinal Survey of Large Language Models (LLMs) During the 2024 US Elections

Overview

This repository contains the questions asked of and responses given by LLMs during the 2024 US elections, collected for a longitudinal survey conducted from July 23, 2024 to November 12, 2024. The study is described in detail in the paper "Large-Scale, Longitudinal Study of Large Language Models During the 2024 US Election Season" by Sarah H. Cen, Andrew Ilyas, Hedi Driss, Charlotte Park, Aspen K. Hopkins, Chara Podimata, and Aleksander Madry.

As described below and in detail in our paper, 12 LLMs (some of which were equipped with internet access) were queried daily (with the exception of Claude 3 Opus, which was queried weekly) on a fixed set of questions over approximately 4 months. In addition, the same questions were asked of Google search as a baseline. The questions varied by type and topic, and each question was asked multiple times with different prompt variations (e.g., different steering and different instructions). This repository contains the questions, responses, and other relevant documentation as well as sample code.

Usage and Quick Start

The file sample_code.py contains an example of how to load and filter the data.

Summary of Data Collection Procedure

Details about the data collection procedure can be found in the full paper. A brief description is provided below.

Models

We queried 12 models:

We used model APIs to directly query 9 LLMs (all were "offline" in that they did not have internet access, with the exception of Perplexity, which is "online").
For the remaining 3 models/systems, we simulated the querying of models equipped with internet access by combining model APIs with Google search (using Serper) via Langchain.
We directly queried Google search using SerpAPI as a baseline.

The models are given in reference_jsons/models_and_questions.json. All models were queried daily, with the exception of Claude 3 Opus, which was queried weekly. Note that Perplexity switched from using llama-3.1-sonar-large-32k-online before Sept. 3 to using llama-3.1-sonar-large-128k-online after.

Question Taxonomy and Generation

All questions were hand crafted according to the taxonomy and procedure given below. The final questions can be found in reference_jsons/all_questions_flattened.json.

Taxonomy. All questions are first divided by type (exo, endo, or baseline), then category, and then subcategory. See reference_jsons/election_questions_taxonomy.json for the full question taxonomy.

Question generation procedure. The final questions were generated in 3 steps:

Each question is associated with a base question template, such as "What is {candidate}’s position on {issue} as a political issue in the 2024 US presidential election?", where candidate and issue are placeholders.
The placeholders are substituted with all combinations of placeholder values, as specified in reference_jsons/election_questions_taxonomy.json. This creates what we refer to as the pre-prompt questions.
Each pre-prompt question results in 22 final questions, which are generated by modifying each pre-prompt question using the prompt variations given in reference_jsons/prompt_variations.json. (Note that the default prompt variation "{}" is referred to as none in each CSV column prompt_type.)

In summary, the question generation process is [question template] -> replace placeholders -> [pre-prompt question] -> add prompt variation -> [final question]. The full set of unique questions is in reference_jsons/all_questions_flattened.json.

Baseline questions. We added 32 baseline questions that are not related to the election. The baseline questions are taken from well known benchmark datasets and are given in reference_jsons/stable_baselines_taxonomy.json.

Dataset

The collected data is given in CSVs in the raw_data folder, in which there are nested folders with the following structure:

text

raw_data/
    ├── [model]/
        ├── [question type]/
            ├── [question category]/
                ├── [pre-prompt question hash]/
                    ├── [prompt hash].csv

Explanation:

Each CSV file contains the responses of a specific LLM (as given by the outermost folder model) to a specific question. Each question is uniquely specified by the pre-prompt question hash and the prompt hash, which correspond to [pre-prompt question hash] folder and [prompt hash] filename, respectively. The nested folders [question type] and [question category] subdivide the questions/CSVs for navigability.

[model] - Name of model queried. Full list of models given in reference_jsons/models_and_questions.json. Specific endpoints/snapshot are described in the full paper.
[question type] - Type of question (endo, exo, or baseline). See reference_jsons/models_and_questions.json.
[question category] - Category of question (e.g., election issues). See reference_jsons/models_and_questions.json.
[pre-prompt question hash] - Unique hash identifier for each unique pre-prompt question (i.e., the question before prompt variation). The hash generation code is given below.
[prompt hash] - Unique hash identifier for each prompt variation/type. Unique prompt variations are in reference_jsons/prompt_variations.json and the hash generation code is given below.

Hash Generation

The following code shows how the pre-prompt question hashes, prompt type hashes, and base question template hashes were generated (which can be found in reference_jsons/pre_prompt_q_hash_mapping.json, reference_jsons/prompt_type_hash_mapping.json, and reference_jsons/base_q_template_hash_mapping.json). It also shows how to obtain the dictionaries mapping hash to original string and vice versa.

python

import hashlib
import random

random.seed(42)

def hash_string(input_string):
    """Hashes a string using SHA-256 and returns the hexadecimal digest."""
    hash_object = hashlib.sha256(input_string.encode())
    return hash_object.hexdigest()

def create_hash_mapping(df, column_name):
    """Creates a dictionary mapping hashes to original strings from a DataFrame column."""
    hash_dict = {hash_string(value): value for value in df[column_name].unique()}
    return hash_dict

def load_rev_mapping(mapping):
    """Loads a reverse mapping from a mapping dictionary."""
    return {v: k for k, v in mapping.items()}

# How hashes for base_q_template, pre_prompt_q, and prompt_type were generated
df.loc[:, "base_q_template_hash"] = df["base_q_template"].apply(hash_string)
df.loc[:, "pre_prompt_q_hash"] = df["pre_prompt_q"].apply(hash_string)
df.loc[:, "prompt_type_hash"] = df["prompt_type"].apply(hash_string)

# How to get dictionaries mapping base_q_template, pre_prompt_q, and prompt_type from hash
# to original string for all unique strings in the DataFrame
base_q_template_hash_dict = create_hash_mapping(df, "base_q_template")
pre_prompt_q_hash_dict = create_hash_mapping(df, "pre_prompt_q")
prompt_type_hash_dict = create_hash_mapping(df, "prompt_type")

# How to get the inverse mapping (from original string to hash) from the hash mapping
inv_base_q_template_hash = load_rev_mapping(base_q_template_hash_dict)
inv_pre_prompt_q_hash = load_rev_mapping(pre_prompt_q_hash_dict)
inv_prompt_type_hash = load_rev_mapping(prompt_type_hash_dict)

Data Format

Each CSV file in raw_data contains the following columns:

| Column Name | Description | |---------------------------|-------------|
| model | Name of the model queried |
| date | Date of query |
| type | Question type (endo, exo, or baseline) |
| category | Question category (e.g., election issues) |
| subcategory | Question subcategory (often omitted) |
| prompt_type | Type of prompt variation applied to the pre_prompt_q |
| frequency | How often this question was asked (daily or weekly) |
| base_q_template | Base template from which the question was derived |
| placeholders | Dictionary of any placeholders in the base template and the placeholders' values |
| pre_prompt_q | Question after placeholders have been applied to base_q_template (before prompt variation) |
| question | Full query/question text (after placeholders replaced in template and prompt variation applied) |
| response | Model's response |
| timestamp | Timestamp when model was queried |
| answer | For baseline questions only, the correct answer |
| base_q_template_hash | Hash of the base question template base_q_template |
| pre_prompt_q_hash | Hash of the pre-prompted question pre_prompt_q |
| prompt_type_hash | Hash corresponding to the prompt variation prompt_type |
| query_type | Type of question phrasing (this is not utilized in our analyses) |
| variation_type | Whether template has any placeholders (this is not utilized in our analyses) |

Citation

If you use this dataset in your research, please cite:

bibtex

@inproceedings{largescale2025cen,
  author    = {Cen, Sarah H. and Ilyas, Andrew and Driss, Hedi and Park, Charlotte and Hopkins, Aspen and Podimata, Chara and Madry, Aleksander},
  title     = {Large-Scale, Longitudinal Study of Large Language Models During the 2024 {US} Election Season},
  year      = {2025}
}

Contact

For questions or issues, please reach out to Sarah Cen at [email protected]

Last Updated: September 16, 2025

Social Proof

HuggingFace Hub

63.5KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--sarahcen--llm-election-data-2024
slug: sarahcen--llm-election-data-2024
source: huggingface
author: sarahcen
license: MIT
tags: task_categories:text-generation, task_categories:question-answering, language:en, license:mit, arxiv:2509.18446, region:us, elections, politics, llm-evaluation, longitudinal-study, us-election-2024

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 63,542
stars: 3
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!

Cite this dataset

🔬Technical Deep Dive

⚖️ Nexus Index V2.0

💬 Index Insight

Verification Authority

👁️ Data Preview

🧬 Field Logic

Dataset Specification

Data Release for Large-Scale, Longitudinal Survey of Large Language Models (LLMs) During the 2024 US Elections

Overview

Usage and Quick Start

Summary of Data Collection Procedure

Models

Question Taxonomy and Generation

Dataset

Explanation:

Hash Generation

Data Format

Citation

Contact

Social Proof

🛡️ Dataset Transparency Report

🆔 Identity & Source

⚙️ Technical Specs

📊 Engagement & Metrics