Latent Semantic Structure in Malicious Programs
Abstract
Latent Semantic Analysis is a method of matrix decomposition used for discovering topics and topic weights in natural language documents. This study uses Latent Semantic Analysis to analyze the composition of binaries of malicious programs. The semantic representation of the term frequency vector representation yields a set of topics, each topic being a composition of terms. The vectors and topics were evaluated quantitatively using a spatial representation. This semantic analysis provides a more abstract representation of the program derived from its term frequency analysis. We use a metric space to represent a program as a collection of vectors, and a distance metric to evaluate their similarity within a topic. The segmentation of the vectors in this dataset provides increased resolution into the program structure.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2022
- DOI:
- 10.48550/arXiv.2210.17390
- arXiv:
- arXiv:2210.17390
- Bibcode:
- 2022arXiv221017390M
- Keywords:
-
- Computer Science - Cryptography and Security
- E-Print:
- 11 pages, 5 figures