Star-Galaxy Classification Using Data Mining Techniques with Considerations for Unbalanced Datasets
Abstract
We used a range of data-mining techniques in an effort to improve the classification of stars and galaxies for imaging data from the Canada-France-Hawaii Telescope Legacy Survey (CFHTLS), and extracted with SExtractor. We found that the Artificial Neural Network (ANN) achieved higher accuracies than Support Vector Machines, but was outperformed by the Random Forest and Decision Tree data-mining techniques on 5000 randomly sampled objects. This has potentially negative implications for SExtractor which uses an ANN to produce a measure of stellarity for each object. We found that the classification of stars and galaxies can be improved by voting (between Decision Trees, Random Forests and ANNs) and using balanced datasets. For the balanced datasets that we created, the three data mining techniques agreed over 80% of the time on the type of object.
- Publication:
-
Astronomical Data Analysis Software and Systems XVIII
- Pub Date:
- September 2009
- Bibcode:
- 2009ASPC..411..318O