The Distributed Cloud Based Engine for Knowledge Discovery in Massive Archives of Astronomical Spectra
The current archives of large-scale spectroscopic surveys, such as SDSS or LAMOST, contain millions of spectra. As some interesting objects (e.g. emission line stars or quasars) can be identified only by checking the shapes of certain spectral lines, machine learning techniques have to be applied, complemented by flexible visualisation of results. We present VO-CLOUD, the distributed cloud-based engine, providing the user with a comfortable web-based environment for conducting machine learning experiments with different algorithms running on multiple nodes. It allows visual backtracking of the individual input spectra at different stages of preprocessing, which is important for checking the nature of outliers or precision of classification. The engine consists of a single master server, representing the user portal, and several workers, running various types of machine learning tasks. The master holds the database of users and their experiments, predefined configuration parameters for individual machine learning models and a repository for data to be preprocessed. The workers have different capabilities based on the installed libraries and the hardware configuration of their host (e.g. number of CPU cores or GPU card type) and more may be dynamically added to provide new machine learning methods.
Astronomical Data Analysis Software and Systems XXV
- Pub Date:
- December 2017