Symbolic analysis meets federated learning to enhance malware identifier

doi:10.48550/arXiv.2204.14159

Symbolic analysis meets federated learning to enhance malware identifier

Over past years, the manually methods to create detection rules were no longer practical in the anti-malware product since the number of malware threats has been growing. Thus, the turn to the machine learning approaches is a promising way to make the malware recognition more efficient. The traditional centralized machine learning requires a large amount of data to train a model with excellent performance. To boost the malware detection, the training data might be on various kind of data sources such as data on host, network and cloud-based anti-malware components, or even, data from different enterprises. To avoid the expenses of data collection as well as the leakage of private data, we present a federated learning system to identify malwares through the behavioural graphs, i.e., system call dependency graphs. It is based on a deep learning model including a graph autoencoder and a multi-classifier module. This model is trained by a secure learning protocol among clients to preserve the private data against the inference attacks. Using the model to identify malwares, we achieve the accuracy of 85\% for the homogeneous graph data and 93\% for the inhomogeneous graph data.

Publication:

arXiv e-prints

Pub Date:

April 2022

DOI:

10.48550/arXiv.2204.14159

arXiv:

arXiv:2204.14159

Bibcode:

2022arXiv220414159D

Keywords:

Computer Science - Cryptography and Security

ADS

Symbolic analysis meets federated learning to enhance malware identifier

Abstract