Outlier detection in regression: conic quadratic formulations
Abstract
In many applications, when building linear regression models, it is important to account for the presence of outliers, i.e., corrupted input data points. Such problems can be formulated as mixed-integer optimization problems involving cubic terms, each given by the product of a binary variable and a quadratic term of the continuous variables. Existing approaches in the literature, typically relying on the linearization of the cubic terms using big-M constraints, suffer from weak relaxation and poor performance in practice. In this work we derive stronger second-order conic relaxations that do not involve big-M constraints. Our computational experiments indicate that the proposed formulations are several orders-of-magnitude faster than existing big-M formulations in the literature for this problem.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2023
- DOI:
- arXiv:
- arXiv:2307.05975
- Bibcode:
- 2023arXiv230705975G
- Keywords:
-
- Mathematics - Optimization and Control;
- Computer Science - Machine Learning;
- Statistics - Methodology;
- Statistics - Machine Learning