Exploring the Effect of Weighting Data of Rare Sub-Samples on Classification

Exploring the Effect of Weighting Data of Rare Sub-Samples on Classification

Machine learning is taking a more prominent role in astronomy research as the size of observational and simulated data sets swell beyond a human's capability of digesting them. When training a classifier on data where one of the classes is intrinsically rare, it is common to up-weight the examples of the rare class to ensure it isn't ignored. It is also a frequent practice to train on restricted data where the balance of source types is closer to equal for the same reason. Here we show that these practices can bias the model toward over-assigning sources to the rare class, and how to compensate for this bias.

Publication:: American Astronomical Society Meeting Abstracts #235
Pub Date:: January 2020
Bibcode:: 2020AAS...23546003L

NASA/ADS

Exploring the Effect of Weighting Data of Rare Sub-Samples on Classification

Abstract