Automating Outlier Detection via Meta-Learning
Abstract
Given an unsupervised outlier detection (OD) task on a new dataset, how can we automatically select a good outlier detection method and its hyperparameter(s) (collectively called a model)? Thus far, model selection for OD has been a "black art"; as any model evaluation is infeasible due to the lack of (i) hold-out data with labels, and (ii) a universal objective function. In this work, we develop the first principled data-driven approach to model selection for OD, called MetaOD, based on meta-learning. MetaOD capitalizes on the past performances of a large body of detection models on existing outlier detection benchmark datasets, and carries over this prior experience to automatically select an effective model to be employed on a new dataset without using any labels. To capture task similarity, we introduce specialized meta-features that quantify outlying characteristics of a dataset. Through comprehensive experiments, we show the effectiveness of MetaOD in selecting a detection model that significantly outperforms the most popular outlier detectors (e.g., LOF and iForest) as well as various state-of-the-art unsupervised meta-learners while being extremely fast. To foster reproducibility and further research on this new problem, we open-source our entire meta-learning system, benchmark environment, and testbed datasets.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2020
- DOI:
- arXiv:
- arXiv:2009.10606
- Bibcode:
- 2020arXiv200910606Z
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Information Retrieval;
- Statistics - Machine Learning
- E-Print:
- 21 pages. The code is available at http://github.com/yzhao062/MetaOD