📊

Dataset

PDB

Name: PDB
Creator: LiteFold
License: CC0-1.0

by LiteFold litefold/pdb

Free2AITools Nexus Index

59.9

S: Semantic 50

Query-time baseline · scored live at search

A: Authority 33

P: Popularity 52

R: Recency 78

Q: Quality 50

Tech Context

Vital Performance —

Source →

Data Integrity 59.9 FNI Score

- Size

- Rows

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	litefold/pdb
License	CC0-1.0
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset_litefold_pdb,
  author = {LiteFold},
  title = {PDB Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/LiteFold/PDB}},
  note = {Accessed via Free2AITools.}
}

APA Style

LiteFold. (2026). PDB [Dataset]. Free2AITools. https://huggingface.co/datasets/LiteFold/PDB

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Free2AITools Nexus Index V2.0

Methodology How FNI works

Semantic (S) 50

Query-time baseline · scored live at search

Authority (A) 33

Popularity (P) 52

Recency (R) 78

Quality (Q) 50

💬 Index Insight

FNI V2.0 for PDB: Authority (A:33), Popularity (P:52), Recency (R:78), Quality (Q:50). Semantic (S) is a query-time baseline scored live at search.

Free2AITools Nexus Index

Data Sources / Provenance

HuggingFace API GitHub Metadata Arxiv Citation DB Methodology

Open data Updated: Live data

⬇️

Downloads

35,490

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

PDB mmCIF Entry Index

The Protein Data Bank is the single global archive of experimentally-determined 3D structures of biological macromolecules, established in 1971 and now holding well over 230,000 entries. It stores atomic coordinates for proteins, nucleic acids, and their complexes determined by X-ray crystallography, cryo-EM, NMR, micro-electron diffraction, and integrative methods, along with the underlying experimental data (structure factors, EM maps, NMR restraints) and rich metadata covering sequence, ligands, modifications, oligomeric state, and validation reports. Every entry has a four-character PDB ID (e.g. 7PZB) and is distributed primarily in the mmCIF format, with legacy PDB-format files retained for compatibility.Operationally, the archive is jointly managed by the wwPDB consortium: RCSB PDB at Rutgers and UCSD handles deposits from the Americas and Oceania and serves as the wwPDB Archive Keeper, PDBe at EMBL-EBI handles Europe and Africa, PDBj at Osaka University handles Asia, and BMRB hosts NMR-specific data. All wwPDB sites receive synchronized weekly updates and serve the archive free of charge under CC0. Within structural biology and protein ML, the PDB is the canonical training and validation source for structure prediction (AlphaFold2/3, RoseTTAFold, Protenix, OpenFold), inverse folding (ProteinMPNN, ESM-IF), docking, MD setup, and template-based modelling, and time-cutoff splits on PDB release dates are the standard way to control for data leakage when benchmarking these models.

Splits

Split	Rows
train	88,873
test	9,951
total	98,824

The split is deterministic: sha256(pdb_id) % 10 == 0 goes to test; buckets 1 through 9 go to train.

Dataset Statistics

| Metr

Social Proof

HuggingFace Hub

35.5KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Updated daily

Source summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--litefold--pdb
slug: litefold--pdb
source: huggingface
author: LiteFold
license: CC0-1.0
tags: license:cc0-1.0, size_categories:10k<n<100k, format:parquet, modality:tabular, modality:text, library:datasets, library:pandas, library:polars, library:mlcroissant, region:us, biology, proteins, protein-structure, pdb, rcsb, mmcif, parquet

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 35,490
stars: null
forks: null

Data indexed from public sources. Updated daily.