📊
Dataset

Massid45 Specimens

by annaviklund hf-dataset--annaviklund--massid45-specimens
Nexus Index
40.0 Top 0%
P / V / C / U Breakdown Calibration Pending

Pillar scores are computed during the next indexing cycle.

Tech Context
Vital Performance
0 DL / 30D
0.0%

MassID45 is a multi-modal dataset comprised of DNA barcodes (COI) and images at two resolutions: of complete bulk insect samples and of the full set of individual specimens from those samples. In this repository, you will find images and metadata for 35,586 individual specimens. The data for individual specimens is a...

Data Integrity 40 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--annaviklund--massid45-specimens
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__annaviklund__massid45_specimens,
  author = {annaviklund},
  title = {Massid45 Specimens Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/annaviklund/massid45-specimens}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
annaviklund. (2026). Massid45 Specimens [Dataset]. Free2AITools. https://huggingface.co/datasets/annaviklund/massid45-specimens

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

âš–ī¸ Nexus Index V16.5

40.0
ESTIMATED IMPACT TIER
Popularity (P) 0
Freshness (F) 0
Completeness (C) 0
Utility (U) 0

đŸ’Ŧ Index Insight

The Free2AITools Nexus Index for Massid45 Specimens aggregates Popularity (P:0), Freshness (F:0), and Completeness (C:0). The Utility score (U:0) represents deployment readiness and ecosystem adoption.

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
âŦ‡ī¸
Downloads
10,763

đŸ‘ī¸ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

đŸ§Ŧ Field Logic

đŸ§Ŧ

Schema not yet indexed for this dataset.

Dataset Specification


language:
- en
pretty_name: MassID45 Individual Specimens
description: Images and metadata for individual specimens in the MassID45 dataset.
size_categories:
- 10K<n<100K

MassID45 Individual Specimens

MassID45 is a multi-modal dataset comprised of DNA barcodes (COI) and images at two resolutions: of complete bulk insect samples and of the full set of individual specimens from those samples. In this repository, you will find images and metadata for 35,586 individual specimens.

The data for individual specimens is available on BOLD under recordset code DS-LPEPA22. This HuggingFace dataset makes images easier to access in bulk.

Further information about the MassID45 dataset is available in the pre-print A multi-modal dataset for insect biodiversity with imagery and DNA at the trap and individual level. Code for preprocessing and training/inference is available at GitHub.

Dataset structure

Images

The images directory contains specimen photographs organized in two resolutions. Images are further organized in sub folders based on bulk sample ids (see column fieldid in metadata/records.csv).

  • images/originals: High-resolution images of individual specimens
  • images/thumbnails: Lower-resolution thumbnail versions of the same specimens

File path format: images/[resolution]/[fieldid]/[processid].jpg

File path example: images/originals/G1P4N6/LPUAE001-22.jpg

Metadata

The metadata directory contains two CSV files with record and image information from BOLD. The metadata was collected using BOLD APIs and is unmodified.

  • metadata/records.csv: Detailed metadata for individual specimens
  • metadata/images.csv: Image-specific metadata for the same specimens

Dataset access

To clone this repository, use the following command:

git clone https://huggingface.co/datasets/annaviklund/massid45-specimens/

To exclude large files when cloning, use the following command:

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/annaviklund/massid45-specimens/

Direct image access

File path format: https://huggingface.co/datasets/annaviklund/massid45-specimens/resolve/main/images/[resolution]/[fieldid]/[processid].jpg

File path example: https://huggingface.co/datasets/annaviklund/massid45-specimens/resolve/main/images/originals/G1P4N6/LPUAE001-22.jpg

Top Tier

Social Proof

HuggingFace Hub
10.8KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id
hf-dataset--annaviklund--massid45-specimens
source
huggingface
author
annaviklund
tags
language:ensize_categories:10kformat:imagefoldermodality:imagemodality:textlibrary:datasetslibrary:mlcroissantarxiv:2507.06972region:us

âš™ī¸ Technical Specs

architecture
null
params billions
null
context length
null

📊 Engagement & Metrics

likes
0
downloads
10,763

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)