Fitting Multiple Machine Learning Models with Performance Based Clustering

doi:10.48550/arXiv.2411.06572

Fitting Multiple Machine Learning Models with Performance Based Clustering

Traditional machine learning approaches assume that data comes from a single generating mechanism, which may not hold for most real life data. In these cases, the single mechanism assumption can result in suboptimal performance. We introduce a clustering framework that eliminates this assumption by grouping the data according to the relations between the features and the target values and we obtain multiple separate models to learn different parts of the data. We further extend our framework to applications having streaming data where we produce outcomes using an ensemble of models. For this, the ensemble weights are updated based on the incoming data batches. We demonstrate the performance of our approach over the widely-studied real life datasets, showing significant improvements over the traditional single-model approaches.

Publication:

arXiv e-prints

Pub Date:

November 2024

DOI:

10.48550/arXiv.2411.06572

arXiv:

arXiv:2411.06572

Bibcode:

2024arXiv241106572L

Keywords:

Computer Science - Machine Learning;
Electrical Engineering and Systems Science - Signal Processing

NASA/ADS

Fitting Multiple Machine Learning Models with Performance Based Clustering

Abstract