📊

Dataset

DeepPatent

by Voxel51 hf-dataset--voxel51--deeppatent

Nexus Index

47.0 Top 0%

P / V / C / U Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context

Vital Performance

0 DL / 30D

0.0%

!image This is a FiftyOne dataset with 46179 samples. If you haven't already, install FiftyOne: DeepPatent is a large-scale dataset of technical drawings extracted from U.S. design patent documents. The da...

Source →

Data Integrity 47 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--voxel51--deeppatent
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__voxel51__deeppatent,
  author = {Voxel51},
  title = {DeepPatent Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/Voxel51/DeepPatent}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

Voxel51. (2026). DeepPatent [Dataset]. Free2AITools. https://huggingface.co/datasets/Voxel51/DeepPatent

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V16.5

Methodology Index Protocol

47.0

ESTIMATED IMPACT TIER

Popularity (P) 0

Freshness (F) 0

Completeness (C) 0

Utility (U) 0

💬 Index Insight

The Free2AITools Nexus Index for DeepPatent aggregates Popularity (P:0), Freshness (F:0), and Completeness (C:0). The Utility score (U:0) represents deployment readiness and ecosystem adoption.

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

84,151

❤️

Likes

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

annotations_creators: []
language: en
size_categories:

100K<n<1M
task_ids: []
pretty_name: DeepPatent
tags:
fiftyone

group
dataset_summary: >

This is a FiftyOne dataset with 46179
samples.

Installation

If you haven't already, install FiftyOne:


pip install -U fiftyone

Usage


import fiftyone as fo

from fiftyone.utils.huggingface import load_from_hub
Load the dataset
Note: other available arguments include 'max_samples', etc
dataset = load_from_hub("Voxel51/DeepPatent")
Launch the Appsession = fo.launch_app(dataset)

license: bsd-3-clause

Dataset Card for DeepPatent

This is a FiftyOne dataset with 46179 samples.

Installation

If you haven't already, install FiftyOne:

pip install -U fiftyone

Usage

import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub

Load the dataset
Note: other available arguments include 'max_samples', etc
dataset = load_from_hub("Voxel51/DeepPatent")
Launch the Appsession = fo.launch_app(dataset)

Dataset Details

Dataset Description

DeepPatent is a large-scale dataset of technical drawings extracted from U.S. design patent documents. The dataset contains patent drawings organized by patent groups (based on publication dates) and individual patent numbers, with each patent containing multiple drawing images showing different views and aspects of the patented designs.

This implementation organizes the data into a FiftyOne grouped dataset structure, where each patent serves as a group containing multiple drawing slices.

Curated by: Michal Kucer, Diane Oyen, Juan Castorena, Jian Wu (Original 2022 dataset)
Language(s): English (metadata and patent documentation)
License: BSD-3 License

Dataset Sources

Repository: https://github.com/GoFigure-LANL/DeepPatent-dataset
Paper: Kucer, M., Oyen, D., Castorena, J., & Wu, J. (2022). DeepPatent: Large Scale Patent Drawing Recognition and Retrieval. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
Related Work: DeepPatent2 - Extended dataset with over 2.7 million technical drawings (Ajayi et al., 2023)

Uses

Direct Use

The DeepPatent dataset can be directly used for:

Patent Drawing Recognition: Training models to recognize and classify patent drawings
Image Retrieval: Building search systems for finding similar patent drawings
Technical Drawing Understanding: Research on understanding technical illustrations and design documents
Multi-view Learning: Studying relationships between different views of the same design
Design Patent Analysis: Analyzing trends and patterns in design patents over time

Out-of-Scope Use

Commercial patent infringement or unauthorized use of patented designs
Any use that violates USPTO terms of service or patent law
Training models for generating counterfeit or infringing designs

Dataset Structure

FiftyOne Grouped Dataset Organization

The dataset is structured as a FiftyOne grouped dataset with the following organization:

Group Structure:

Group Key: patent_number (e.g., "USD0806350-20180102")
Group Field: group (automatically managed by FiftyOne)
Slices: Drawing numbers (e.g., "D00000", "D00001", "D00002", etc.)

Sample Fields:

filepath: Full path to the patent drawing image (PNG format)
patent_group: Date-based patent group identifier (e.g., "I20180102" for patents published on January 2, 2018)
patent_number: Complete patent identifier including number and date (e.g., "USD0806350-20180102")
drawing_number: Simplified drawing number extracted from filename (e.g., "D00001")
filename: Original image filename
group: FiftyOne group information linking related drawings
metadata: Image metadata (width, height, channels, MIME type, etc.)

Dataset Characteristics:

Total Samples: 351,506 patent drawing images
Total Patents: Variable number of unique patents (each patent is one group)
Images per Patent: Highly variable (ranging from 5 to over 1,400 drawings per patent)
Drawing Number Slices: Varies based on the maximum number of drawings in any patent
Image Format: PNG
Date Range: Covers multiple years of U.S. design patent publications (2018-2020 visible in this subset)

Grouping Benefits:

All drawings from the same patent are linked together via the group structure
Easy access to all views/drawings for a specific patent
Efficient querying by patent characteristics
Natural organization for multi-view and sequential learning tasks

Dataset Creation

Curation Rationale

The DeepPatent dataset was created to address the lack of large-scale datasets for technical drawing understanding and patent analysis. Design patents represent a unique domain of technical illustrations that combine artistic design with functional representation, making them valuable for computer vision research.

Source Data

Data Collection and Processing

Source: U.S. Patent and Trademark Office (USPTO) design patent documents
Collection Method: Automated extraction of drawing figures from published design patent PDFs
Time Period: Multiple years of patent publications (at least 2018-2020 in this subset)
Processing: Drawing images extracted, organized by patent number and publication date
Drawing Numbering: Original USPTO drawing numbers simplified (e.g., "D00012-1465" → "D00012")

Who are the source data producers?

The source data producers are:

Primary Source: U.S. Patent and Trademark Office (USPTO)
Patent Applicants: Individual inventors, companies, and design firms who filed design patents
Dataset Curators: Research team led by Michal Kucer, Diane Oyen, Juan Castorena, and Jian Wu

Annotations

The dataset primarily consists of unannotated patent drawing images with metadata. The original paper (Kucer et al., 2022) may have included additional annotations for specific tasks like recognition and retrieval.

Available Metadata:

Patent publication dates (encoded in group names)
Patent numbers (unique identifiers)
Drawing sequence numbers
Image technical metadata (dimensions, format)

Citation

BibTeX

@inproceedings{kucer2022deeppatent,
  title={DeepPatent: Large Scale Patent Drawing Recognition and Retrieval},
  author={Kucer, Michal and Oyen, Diane and Castorena, Juan and Wu, Jian},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year={2022}
}

Related Work

For the extended dataset with additional annotations:

@article{ajayi2023deeppatent2,
  title={DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding},
  author={Ajayi, Kehinde and Wei, Xin and Gryder, Martin and Shields, Winston and Wu, Jian and Jones, Shawn M. and Kucer, Michal and Oyen, Diane},
  journal={arXiv preprint arXiv:2311.04098},
  year={2023}
}

Top Tier

Social Proof

HuggingFace Hub

1Likes

84.2KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id: hf-dataset--voxel51--deeppatent
source: huggingface
author: Voxel51
tags: language:enlicense:bsd-3-clausesize_categories:100kmodality:imagelibrary:fiftyonearxiv:2311.04098region:usfiftyonegroup

⚙️ Technical Specs

architecture: null
params billions: null
context length: null

📊 Engagement & Metrics

likes: 1
downloads: 84,151

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!

Cite this dataset

🔬Technical Deep Dive

⚖️ Nexus Index V16.5

💬 Index Insight

Verification Authority

👁️ Data Preview

🧬 Field Logic

Dataset Specification

Installation

Usage

Load the dataset

Note: other available arguments include 'max_samples', etc

Launch the App

license: bsd-3-clause

Dataset Card for DeepPatent

Installation

Usage

Load the dataset

Note: other available arguments include 'max_samples', etc

Launch the App

Dataset Details

Dataset Description

Dataset Sources

Uses

Direct Use

Out-of-Scope Use

Dataset Structure

FiftyOne Grouped Dataset Organization

Dataset Creation

Curation Rationale

Source Data

Data Collection and Processing

Who are the source data producers?

Annotations

Citation

BibTeX

Related Work

Social Proof

🛡️ Dataset Transparency Report

🆔 Identity & Source

⚙️ Technical Specs

📊 Engagement & Metrics