Using rare category detection algorithms to find unique sample classes in ChemCam data from Curiosity's first 90 sols in Gale crater, Mars
Abstract
The ChemCam instrument on the Mars Science Laboratory rover Curiosity transmits on average 800 individual LIBS spectra per sol, sampled from one to three targets. Investigating each of these spectra individually is extremely time consuming, yet without doing so a rare but particularly informative sample might be overlooked. We have addressed this by developing and applying machine learning algorithms to solve this rare category detection problem, where the explicit goal is to allow the domain expert to identify all classes in the dataset with exposure to the minimal number of queries. We have developed an algorithm that combines active learning with exploitation of the underlying structural density of the data through semi-supervised learning. The approach creates a tree of hierarchical clusters based on the Euclidean distances between the spectra, with each wavelength taken as a separate feature. Starting at the top of the tree, a node corresponding to a cluster of spectra is selected. The user is presented with the two spectra from this cluster that are furthest apart in feature space, and asked whether or not they are from the same class. If the two 'query' spectra are deemed by the domain expert to belong to different classes, then the two sub nodes that combine to form the initial node in the tree are marked for further examination. If the two inspected spectra are considered to be in the same class, then this branch is pruned from the tree and the user is asked for a label. The performance of this algorithm is examined using a synthetic dataset with comparison to the random selection strategy, resulting in a significant decrease in the number of queries. The algorithm is then applied by a domain expert to 7,000 LIBS spectra from the first 90 sols, resulting in a quick identification of an instrument artifact, as well suggesting comparisons between samples that the domain expert would otherwise not have made. Using this technique, we have classified ChemCam spectra into discrete groups; we anticipate that this will aid geologists in understanding the spatial distribution of geochemical results in Gale crater up to Sol 90 of the mission.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2013
- Bibcode:
- 2013AGUFMIN33A1532L
- Keywords:
-
- 0555 COMPUTATIONAL GEOPHYSICS Neural networks;
- fuzzy logic;
- machine learning;
- 1916 INFORMATICS Data and information discovery;
- 5470 PLANETARY SCIENCES: SOLID SURFACE PLANETS Surface materials and properties