Bi-Sparse Unsupervised Feature Selection
Abstract
To efficiently deal with high-dimensional datasets in many areas, unsupervised feature selection (UFS) has become a rising technique for dimension reduction. Even though there are many UFS methods, most of them only consider the global structure of datasets by embedding a single sparse regularization or constraint. In this paper, we introduce a novel bi-sparse UFS method, called BSUFS, to simultaneously characterize both global and local structures. The core idea of BSUFS is to incorporate $\ell_{2,p}$-norm and $\ell_q$-norm into the classical principal component analysis (PCA), which enables our proposed method to select relevant features and filter out irrelevant noise accurately. Here, the parameters $p$ and $q$ are within the range of [0,1). Therefore, BSUFS not only constructs a unified framework for bi-sparse optimization, but also includes some existing works as special cases. To solve the resulting non-convex model, we propose an efficient proximal alternating minimization (PAM) algorithm using Riemannian manifold optimization and sparse optimization techniques. Theoretically, PAM is proven to have global convergence, i.e., for any random initial point, the generated sequence converges to a critical point that satisfies the first-order optimality condition. Extensive numerical experiments on synthetic and real-world datasets demonstrate the effectiveness of our proposed BSUFS. Specifically, the average accuracy (ACC) is improved by at least 4.71% and the normalized mutual information (NMI) is improved by at least 3.14% on average compared to the existing UFS competitors. The results validate the advantages of bi-sparse optimization in feature selection and show its potential for other fields in image processing. Our code will be available at https://github.com/xianchaoxiu.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.16819
- Bibcode:
- 2024arXiv241216819X
- Keywords:
-
- Mathematics - Optimization and Control;
- Computer Science - Machine Learning