Quovadis Speakeasy
Pillar scores are computed during the next indexing cycle.
This dataset was created and further refined as part of the following two publications: - "Quo Vadis: Hybrid Machine Learning Meta-Model Based on Contextual and Behavioral Malware Representations", Trizna et al., 2022, https://dl.acm.org/doi/10.1145/3560830.3563726 - "Nebula: Self-Attention for Dynamic Malware Analysis", Trizna et al., 2024, https://ieeexplore.ieee.org/document/10551436 If you used it in your research, please cite us: Arxiv references of both paper...
| Entity Passport | |
| Registry ID | hf-dataset--wy777--quovadis-speakeasy |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__wy777__quovadis_speakeasy,
author = {wy777},
title = {Quovadis Speakeasy Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/wy777/quovadis-speakeasy}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
âī¸ Nexus Index V2.0
đŦ Index Insight
FNI V2.0 for Quovadis Speakeasy: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).
Verification Authority
đī¸ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
đ Explore Full Dataset âđ§Ŧ Field Logic
Schema not yet indexed for this dataset.
Dataset Specification
license: apache-2.0
About Dataset
Citation
This dataset was created and further refined as part of the following two publications:
"Quo Vadis: Hybrid Machine Learning Meta-Model Based on Contextual and Behavioral Malware Representations", Trizna et al., 2022, https://dl.acm.org/doi/10.1145/3560830.3563726
"Nebula: Self-Attention for Dynamic Malware Analysis", Trizna et al., 2024, https://ieeexplore.ieee.org/document/10551436
If you used it in your research, please cite us:
@inproceedings{quovadis,
author = {Trizna, Dmitrijs},
title = {Quo Vadis: Hybrid Machine Learning Meta-Model Based on Contextual and Behavioral Malware Representations},
year = {2022},
isbn = {9781450398800},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3560830.3563726},
doi = {10.1145/3560830.3563726},
booktitle = {Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security},
pages = {127â136},
numpages = {10},
keywords = {reverse engineering, neural networks, malware, emulation, convolutions},
location = {Los Angeles, CA, USA},
series = {AISec'22}
}
@ARTICLE{nebula,
author={Trizna, Dmitrijs and Demetrio, Luca and Biggio, Battista and Roli, Fabio},
journal={IEEE Transactions on Information Forensics and Security},
title={Nebula: Self-Attention for Dynamic Malware Analysis},
year={2024},
volume={19},
number={},
pages={6155-6167},
keywords={Malware;Feature extraction;Data models;Analytical models;Long short term memory;Task analysis;Encoding;Malware;transformers;dynamic analysis;convolutional neural networks},
doi={10.1109/TIFS.2024.3409083}}
Arxiv references of both papers: arxiv.org/abs/2310.10664 and arxiv.org/abs/2208.12248.
Description
This dataset contains behavioral reports obtained with Speakeasy emulator from 93533 32-bit portable executables (PE).
This is complementary dataset to https://huggingface.co/datasets/dtrizna/quovadis-ember, which represents static EMBER features of the same malware samples.
To reflect concept drift in malware:
- 76126 files that form a training set were collected in Jan 2022.
- 17407 files that form a test set were collected in Apr 2022.
Labels
Files located in report_clean and report_windows_syswow64 are clean (benign). All others represent malware distributed over 7 families. A specific number of files in each folder:
Training set
- report_backdoor : 11062
- report_clean : 24434
- report_coinminer : 6891
- report_dropper : 8243
- report_keylogger : 4378
- report_ransomware : 9627
- report_rat : 1697
- report_trojan : 8733
- report_windows_syswow64 : 236
Test set
- report_backdoor : 1940
- report_clean : 7944
- report_coinminer : 1684
- report_dropper : 252
- report_keylogger : 1041
- report_ransomware : 2139
- report_rat : 1258
- report_trojan : 1085
- report_windows_syswow64 : 59
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
đĄī¸ Dataset Transparency Report
Verified data manifest for traceability and transparency.
đ Identity & Source
- id
- hf-dataset--wy777--quovadis-speakeasy
- source
- huggingface
- author
- wy777
- tags
- license:apache-2.0arxiv:2310.10664arxiv:2208.12248region:us
âī¸ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
đ Engagement & Metrics
- likes
- 0
- downloads
- 43,519
Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)