DeepPatent
Pillar scores are computed during the next indexing cycle.
!image This is a FiftyOne dataset with 46179 samples. If you haven't already, install FiftyOne: DeepPatent is a large-scale dataset of technical drawings extracted from U.S. design patent documents. The da...
| Entity Passport | |
| Registry ID | hf-dataset--voxel51--deeppatent |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__voxel51__deeppatent,
author = {Voxel51},
title = {DeepPatent Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/Voxel51/DeepPatent}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
âī¸ Nexus Index V16.5
đŦ Index Insight
The Free2AITools Nexus Index for DeepPatent aggregates Popularity (P:0), Freshness (F:0), and Completeness (C:0). The Utility score (U:0) represents deployment readiness and ecosystem adoption.
Verification Authority
đī¸ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
đ Explore Full Dataset âđ§Ŧ Field Logic
Schema not yet indexed for this dataset.
Dataset Specification
annotations_creators: []
language: en
size_categories:
100K<n<1M
task_ids: []
pretty_name: DeepPatent
tags:fiftyone
group
dataset_summary: >This is a FiftyOne dataset with 46179
samples.Installation
If you haven't already, install FiftyOne:
pip install -U fiftyoneUsage
import fiftyone as fofrom fiftyone.utils.huggingface import load_from_hub
Load the dataset
Note: other available arguments include 'max_samples', etc
dataset = load_from_hub("Voxel51/DeepPatent")
Launch the App
session = fo.launch_app(dataset)
license: bsd-3-clause
Dataset Card for DeepPatent

This is a FiftyOne dataset with 46179 samples.
Installation
If you haven't already, install FiftyOne:
pip install -U fiftyone
Usage
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub
Load the dataset
Note: other available arguments include 'max_samples', etc
dataset = load_from_hub("Voxel51/DeepPatent")
Launch the App
session = fo.launch_app(dataset)
Dataset Details
Dataset Description
DeepPatent is a large-scale dataset of technical drawings extracted from U.S. design patent documents. The dataset contains patent drawings organized by patent groups (based on publication dates) and individual patent numbers, with each patent containing multiple drawing images showing different views and aspects of the patented designs.
This implementation organizes the data into a FiftyOne grouped dataset structure, where each patent serves as a group containing multiple drawing slices.
- Curated by: Michal Kucer, Diane Oyen, Juan Castorena, Jian Wu (Original 2022 dataset)
- Language(s): English (metadata and patent documentation)
- License: BSD-3 License
Dataset Sources
- Repository: https://github.com/GoFigure-LANL/DeepPatent-dataset
- Paper: Kucer, M., Oyen, D., Castorena, J., & Wu, J. (2022). DeepPatent: Large Scale Patent Drawing Recognition and Retrieval. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
- Related Work: DeepPatent2 - Extended dataset with over 2.7 million technical drawings (Ajayi et al., 2023)
Uses
Direct Use
The DeepPatent dataset can be directly used for:
- Patent Drawing Recognition: Training models to recognize and classify patent drawings
- Image Retrieval: Building search systems for finding similar patent drawings
- Technical Drawing Understanding: Research on understanding technical illustrations and design documents
- Multi-view Learning: Studying relationships between different views of the same design
- Design Patent Analysis: Analyzing trends and patterns in design patents over time
Out-of-Scope Use
- Commercial patent infringement or unauthorized use of patented designs
- Any use that violates USPTO terms of service or patent law
- Training models for generating counterfeit or infringing designs
Dataset Structure
FiftyOne Grouped Dataset Organization
The dataset is structured as a FiftyOne grouped dataset with the following organization:
Group Structure:
- Group Key:
patent_number(e.g., "USD0806350-20180102") - Group Field:
group(automatically managed by FiftyOne) - Slices: Drawing numbers (e.g., "D00000", "D00001", "D00002", etc.)
Sample Fields:
filepath: Full path to the patent drawing image (PNG format)patent_group: Date-based patent group identifier (e.g., "I20180102" for patents published on January 2, 2018)patent_number: Complete patent identifier including number and date (e.g., "USD0806350-20180102")drawing_number: Simplified drawing number extracted from filename (e.g., "D00001")filename: Original image filenamegroup: FiftyOne group information linking related drawingsmetadata: Image metadata (width, height, channels, MIME type, etc.)
Dataset Characteristics:
- Total Samples: 351,506 patent drawing images
- Total Patents: Variable number of unique patents (each patent is one group)
- Images per Patent: Highly variable (ranging from 5 to over 1,400 drawings per patent)
- Drawing Number Slices: Varies based on the maximum number of drawings in any patent
- Image Format: PNG
- Date Range: Covers multiple years of U.S. design patent publications (2018-2020 visible in this subset)
Grouping Benefits:
- All drawings from the same patent are linked together via the group structure
- Easy access to all views/drawings for a specific patent
- Efficient querying by patent characteristics
- Natural organization for multi-view and sequential learning tasks
Dataset Creation
Curation Rationale
The DeepPatent dataset was created to address the lack of large-scale datasets for technical drawing understanding and patent analysis. Design patents represent a unique domain of technical illustrations that combine artistic design with functional representation, making them valuable for computer vision research.
Source Data
Data Collection and Processing
- Source: U.S. Patent and Trademark Office (USPTO) design patent documents
- Collection Method: Automated extraction of drawing figures from published design patent PDFs
- Time Period: Multiple years of patent publications (at least 2018-2020 in this subset)
- Processing: Drawing images extracted, organized by patent number and publication date
- Drawing Numbering: Original USPTO drawing numbers simplified (e.g., "D00012-1465" â "D00012")
Who are the source data producers?
The source data producers are:
- Primary Source: U.S. Patent and Trademark Office (USPTO)
- Patent Applicants: Individual inventors, companies, and design firms who filed design patents
- Dataset Curators: Research team led by Michal Kucer, Diane Oyen, Juan Castorena, and Jian Wu
Annotations
The dataset primarily consists of unannotated patent drawing images with metadata. The original paper (Kucer et al., 2022) may have included additional annotations for specific tasks like recognition and retrieval.
Available Metadata:
- Patent publication dates (encoded in group names)
- Patent numbers (unique identifiers)
- Drawing sequence numbers
- Image technical metadata (dimensions, format)
Citation
BibTeX
@inproceedings{kucer2022deeppatent,
title={DeepPatent: Large Scale Patent Drawing Recognition and Retrieval},
author={Kucer, Michal and Oyen, Diane and Castorena, Juan and Wu, Jian},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
year={2022}
}
Related Work
For the extended dataset with additional annotations:
@article{ajayi2023deeppatent2,
title={DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding},
author={Ajayi, Kehinde and Wei, Xin and Gryder, Martin and Shields, Winston and Wu, Jian and Jones, Shawn M. and Kucer, Michal and Oyen, Diane},
journal={arXiv preprint arXiv:2311.04098},
year={2023}
}
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
đĄī¸ Dataset Transparency Report
Verified data manifest for traceability and transparency.
đ Identity & Source
- id
- hf-dataset--voxel51--deeppatent
- source
- huggingface
- author
- Voxel51
- tags
- language:enlicense:bsd-3-clausesize_categories:100k
modality:imagelibrary:fiftyonearxiv:2311.04098region:usfiftyonegroup
âī¸ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
đ Engagement & Metrics
- likes
- 1
- downloads
- 84,151
Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)