About Dataset Citation This dataset was created and further refined as part of the following two publications:
If you used it in your research, please cite us:
bibtex
Copy
@inproceedings{quovadis,
author = {Trizna, Dmitrijs},
title = {Quo Vadis: Hybrid Machine Learning Meta-Model Based on Contextual and Behavioral Malware Representations},
year = {2022},
isbn = {9781450398800},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3560830.3563726},
doi = {10.1145/3560830.3563726},
booktitle = {Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security},
pages = {127â136},
numpages = {10},
keywords = {reverse engineering, neural networks, malware, emulation, convolutions},
location = {Los Angeles, CA, USA},
series = {AISec'22}
}
@ARTICLE{nebula,
author={Trizna, Dmitrijs and Demetrio, Luca and Biggio, Battista and Roli, Fabio},
journal={IEEE Transactions on Information Forensics and Security},
title={Nebula: Self-Attention for Dynamic Malware Analysis},
year={2024},
volume={19},
number={},
pages={6155-6167},
keywords={Malware;Feature extraction;Data models;Analytical models;Long short term memory;Task analysis;Encoding;Malware;transformers;dynamic analysis;convolutional neural networks},
doi={10.1109/TIFS.2024.3409083}}
Arxiv references of both papers: arxiv.org/abs/2310.10664 and arxiv.org/abs/2208.12248.
Description This dataset contains behavioral reports obtained with Speakeasy emulator from 93533 32-bit portable executables (PE).
This is complementary dataset to https://huggingface.co/datasets/dtrizna/quovadis-ember , which represents static EMBER features of the same malware samples.
To reflect concept drift in malware:
76126 files that form a training set were collected in Jan 2022.
17407 files that form a test set were collected in Apr 2022.
Labels Files located in report_clean and report_windows_syswow64 are clean (benign). All others represent malware distributed over 7 families. A specific number of files in each folder:
Training set
report_backdoor : 11062
report_clean : 24434
report_coinminer : 6891
report_dropper : 8243
report_keylogger : 4378
report_ransomware : 9627
report_rat : 1697
report_trojan : 8733
report_windows_syswow64 : 236
Test set
report_backdoor : 1940
report_clean : 7944
report_coinminer : 1684
report_dropper : 252
report_keylogger : 1041
report_ransomware : 2139
report_rat : 1258
report_trojan : 1085
report_windows_syswow64 : 59