Fitting Multiple Machine Learning Models with Performance Based Clustering
Abstract
Traditional machine learning approaches assume that data comes from a single generating mechanism, which may not hold for most real life data. In these cases, the single mechanism assumption can result in suboptimal performance. We introduce a clustering framework that eliminates this assumption by grouping the data according to the relations between the features and the target values and we obtain multiple separate models to learn different parts of the data. We further extend our framework to applications having streaming data where we produce outcomes using an ensemble of models. For this, the ensemble weights are updated based on the incoming data batches. We demonstrate the performance of our approach over the widely-studied real life datasets, showing significant improvements over the traditional single-model approaches.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2024
- DOI:
- 10.48550/arXiv.2411.06572
- arXiv:
- arXiv:2411.06572
- Bibcode:
- 2024arXiv241106572L
- Keywords:
-
- Computer Science - Machine Learning;
- Electrical Engineering and Systems Science - Signal Processing