📊
Dataset

Flint2025

by Ryanray Umich hf-dataset--ryanray-umich--flint2025
Nexus Index
29.0 Top 100%
S: Semantic 50
A: Authority 0
P: Popularity 55
R: Recency 28
Q: Quality 30
Tech Context
Vital Performance
0 DL / 30D
0.0%
Data Integrity 29 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--ryanray-umich--flint2025
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__ryanray_umich__flint2025,
  author = {Ryanray Umich},
  title = {Flint2025 Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/ryanray-umich/flint2025}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
Ryanray Umich. (2026). Flint2025 [Dataset]. Free2AITools. https://huggingface.co/datasets/ryanray-umich/flint2025

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

âš–ī¸ Nexus Index V2.0

29.0
TOP 100% SYSTEM IMPACT
Semantic (S) 50
Authority (A) 0
Popularity (P) 55
Recency (R) 28
Quality (Q) 30

đŸ’Ŧ Index Insight

FNI V2.0 for Flint2025: Semantic (S:50), Authority (A:0), Popularity (P:55), Recency (R:28), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
âŦ‡ī¸
Downloads
56,107

đŸ‘ī¸ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

đŸ§Ŧ Field Logic

đŸ§Ŧ

Schema not yet indexed for this dataset.

Dataset Specification

Summary

81,000 domains from 20 large website categories. For each domain we collected a screenshot (1024x1024), the final document state (as HTML) and a PCAP file containing all the network traffic. All of this occurred within a docker container so the network traffic is not cross-contaminated. You can duplicate our work using our scraping tool) and the domain list in this repository. Unfortunately, the domain categories have not been approved for release at this time

This repository is about 275 GB: 224 GB of PCAP files, 29GB of screenshots and 22GB of webpages.

Usage

This is a reference repository, which would be used for you to replicate our work. You shouldn't actually use it unless you absolutely need all the PCAP metadata.

For training on packets, you may want multiple samples per domain, and this dataset only has one sample per domain. Consider using (Flint2025-IP)[https://example.com] which has multiple samples per domain. It is more compact because we convert the PCAP format into CSV and drop the payload. We have included as many features as possible while keeping the total size under 300GB.

For training on HTML and/or screenshot, you do not need to download all the PCAP files. Instead, use Flint2025-small which only contains the HTML and Screenshot.

📊 Structured Schema (Zero-Fabrication)

Feature Key Data Type
image Image
label ClassLabel

Estimated Rows: 133

Social Proof

HuggingFace Hub
56.1KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id
hf-dataset--ryanray-umich--flint2025
slug
ryanray-umich--flint2025
source
huggingface
author
Ryanray Umich
license
tags
size_categories:n<1k, format:imagefolder, modality:image, library:datasets, library:mlcroissant, region:us

âš™ī¸ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

📊 Engagement & Metrics

downloads
56,107
stars
0
forks
0

Data indexed from public sources. Updated daily.