Efficient L1Norm PrincipalComponent Analysis via Bit Flipping
Abstract
It was shown recently that the $K$ L1norm principal components (L1PCs) of a realvalued data matrix $\mathbf X \in \mathbb R^{D \times N}$ ($N$ data samples of $D$ dimensions) can be exactly calculated with cost $\mathcal{O}(2^{NK})$ or, when advantageous, $\mathcal{O}(N^{dK  K + 1})$ where $d=\mathrm{rank}(\mathbf X)$, $K<d$ [1],[2]. In applications where $\mathbf X$ is large (e.g., "big" data of large $N$ and/or "heavy" data of large $d$), these costs are prohibitive. In this work, we present a novel suboptimal algorithm for the calculation of the $K < d$ L1PCs of $\mathbf X$ of cost $\mathcal O(ND \mathrm{min} \{ N,D\} + N^2(K^4 + dK^2) + dNK^3)$, which is comparable to that of standard (L2norm) PC analysis. Our theoretical and experimental studies show that the proposed algorithm calculates the exact optimal L1PCs with high frequency and achieves higher value in the L1PC optimization metric than any known alternative algorithm of comparable computational cost. The superiority of the calculated L1PCs over standard L2PCs (singular vectors) in characterizing potentially faulty data/measurements is demonstrated with experiments on data dimensionality reduction and disease diagnosis from genomic data.
 Publication:

IEEE Transactions on Signal Processing
 Pub Date:
 August 2017
 DOI:
 10.1109/TSP.2017.2708023
 arXiv:
 arXiv:1610.01959
 Bibcode:
 2017ITSP...65.4252M
 Keywords:

 Computer Science  Data Structures and Algorithms;
 Computer Science  Machine Learning;
 Statistics  Machine Learning
 EPrint:
 doi:10.1109/TSP.2017.2708023