Malware Task Identification: A Data Driven Approach
Abstract
Identifying the tasks a given piece of malware was designed to perform (e.g. logging keystrokes, recording video, establishing remote access, etc.) is a difficult and time-consuming operation that is largely human-driven in practice. In this paper, we present an automated method to identify malware tasks. Using two different malware collections, we explore various circumstances for each - including cases where the training data differs significantly from test; where the malware being evaluated employs packing to thwart analytical techniques; and conditions with sparse training data. We find that this approach consistently out-performs the current state-of-the art software for malware task identification as well as standard machine learning approaches - often achieving an unbiased F1 score of over 0.9. In the near future, we look to deploy our approach for use by analysts in an operational cyber-security environment.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2015
- DOI:
- 10.48550/arXiv.1507.01930
- arXiv:
- arXiv:1507.01930
- Bibcode:
- 2015arXiv150701930N
- Keywords:
-
- Computer Science - Cryptography and Security
- E-Print:
- 8 pages full paper, accepted FOSINT-SI (2015)