Locality Sensitive Hash Aggregated Nonlinear Neighbourhood Matrix Factorization for Online Sparse Big Data Analysis
Matrix factorization (MF) can extract the low-rank features and integrate the information of the data manifold distribution from high-dimensional data, which can consider the nonlinear neighbourhood information. Thus, MF has drawn wide attention for low-rank analysis of sparse big data, e.g., Collaborative Filtering (CF) Recommender Systems, Social Networks, and Quality of Service. However, the following two problems exist: 1) huge computational overhead for the construction of the Graph Similarity Matrix (GSM), and 2) huge memory overhead for the intermediate GSM. Therefore, GSM-based MF, e.g., kernel MF, graph regularized MF, etc., cannot be directly applied to the low-rank analysis of sparse big data on cloud and edge platforms. To solve this intractable problem for sparse big data analysis, we propose Locality Sensitive Hashing (LSH) aggregated MF (LSH-MF), which can solve the following problems: 1) The proposed probabilistic projection strategy of LSH-MF can avoid the construction of the GSM. Furthermore, LSH-MF can satisfy the requirement for the accurate projection of sparse big data. 2) To run LSH-MF for fine-grained parallelization and online learning on GPUs, we also propose CULSH-MF, which works on CUDA parallelization. Experimental results show that CULSH-MF can not only reduce the computational time and memory overhead but also obtain higher accuracy. Compared with deep learning models, CULSH-MF can not only save training time but also achieve the same accuracy performance.