Accelerating System Log Processing by Semi-supervised Learning: A Technical Report
Abstract
There is an increasing need for more automated system-log analysis tools for large scale online system in a timely manner. However, conventional way to monitor and classify the log output based on keyword list does not scale well for complex system in which codes contributed by a large group of developers, with diverse ways of encoding the error messages, often with misleading pre-set labels. In this paper, we propose that the design of a large scale online log analysis should follow the "Least Prior Knowledge Principle", in which unsupervised or semi-supervised solution with the minimal prior knowledge of the log should be encoded directly. Thereby, we report our experience in designing a two-stage machine learning based method, in which the system logs are regarded as the output of a quasi-natural language, pre-filtered by a perplexity score threshold, and then undergo a fine-grained classification procedure. Tests on empirical data show that our method has obvious advantage regarding to the processing speed and classification accuracy.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2018
- DOI:
- 10.48550/arXiv.1811.01833
- arXiv:
- arXiv:1811.01833
- Bibcode:
- 2018arXiv181101833L
- Keywords:
-
- Computer Science - Software Engineering;
- Computer Science - Computation and Language;
- Computer Science - Information Retrieval