SGD_Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition
Abstract
Sparse Tucker Decomposition (STD) algorithms learn a core tensor and a group of factor matrices to obtain an optimal lowrank representation feature for the \underline{H}igh\underline{O}rder, \underline{H}igh\underline{D}imension, and \underline{S}parse \underline{T}ensor (HOHDST). However, existing STD algorithms face the problem of intermediate variables explosion which results from the fact that the formation of those variables, i.e., matrices KhatriRao product, Kronecker product, and matrixmatrix multiplication, follows the whole elements in sparse tensor. The above problems prevent deep fusion of efficient computation and big data platforms. To overcome the bottleneck, a novel stochastic optimization strategy (SGD$\_$Tucker) is proposed for STD which can automatically divide the highdimension intermediate variables into small batches of intermediate matrices. Specifically, SGD$\_$Tucker only follows the randomly selected small samples rather than the whole elements, while maintaining the overall accuracy and convergence rate. In practice, SGD$\_$Tucker features the two distinct advancements over the state of the art. First, SGD$\_$Tucker can prune the communication overhead for the core tensor in distributed settings. Second, the low datadependence of SGD$\_$Tucker enables finegrained parallelization, which makes SGD$\_$Tucker obtaining lower computational overheads with the same accuracy. Experimental results show that SGD$\_$Tucker runs at least 2$X$ faster than the state of the art.
 Publication:

arXiv eprints
 Pub Date:
 December 2020
 arXiv:
 arXiv:2012.03550
 Bibcode:
 2020arXiv201203550L
 Keywords:

 Computer Science  Distributed;
 Parallel;
 and Cluster Computing