Finding rare objects and building pure samples: Probabilistic quasar classification with Gaia
Abstract
We have developed a method for classifying rare objects in surveys with the particular goal of building very pure samples. It works by modifying the output probabilities from a classifier so as to accommodate our expectation (priors) concerning the relative frequencies of different classes of objects. We demonstrate our method using the Discrete Source Classifier, a supervised classifier currently based on Support Vector Machines, which we are developing in preparation for the Gaia data analysis. DSC classifies objects using their very low resolution optical spectra. We look in detail at the problem of quasar classification (partly because a pure quasar sample is necessary to define the Gaia astrometric reference frame). By varying a posterior probability threshold in DSC (Pt) we can trade off sample completeness and contamination. We show, using our simulated data, that it is possible to achieve a pure sample of quasars (upper limit on contamination of 1 in 40 000) with a completeness of 65% at magnitudes of G = 18.5, and 50% at G = 20.0, even when quasars have a frequency of only 1 in every 2000 objects. The star sample completeness is simultaneously more than 99% with a contamination of 0.7%. Including parallax and proper motion in the classifier barely changes the results. Not accounting for class priors leads to serious misclassifications and poor predictions for sample completeness and contamination. Our method controls the prior and so allows a single model to be applied to any target population without having to tune the training data and retrain the model.
- Publication:
-
Classification and Discovery in Large Astronomical Surveys
- Pub Date:
- December 2008
- DOI:
- 10.1063/1.3059079
- arXiv:
- arXiv:0809.3373
- Bibcode:
- 2008AIPC.1082....3B
- Keywords:
-
- 98.54.Aj;
- 95.75.Pq;
- 95.80.+p;
- Quasars;
- Mathematical procedures and computer techniques;
- Astronomical catalogs atlases sky surveys databases retrieval systems archives etc.;
- Astrophysics;
- Physics - Data Analysis;
- Statistics and Probability;
- Statistics - Machine Learning
- E-Print:
- MNRAS accepted