Hardware-Efficient Mixed-Precision CP Tensor Decomposition

doi:10.48550/arXiv.2209.04003

Hardware-Efficient Mixed-Precision CP Tensor Decomposition

Tensor decomposition has been widely used in machine learning and high-volume data analysis. However, large-scale tensor factorization often consumes huge memory and computing cost. Meanwhile, modernized computing hardware such as tensor processing units (TPU) and Tensor Core GPU has opened a new window of hardware-efficient computing via mixed- or low-precision arithmetic representations. In this paper, we exploit the low-precision representation of tensor factorization, and propose a mixed-precision block stochastic gradient descent (SGD) method to reduce the costs of CP tensor decomposition. Our method achieves robust and fast convergence via a two-stage optimization, i.e., SignSGD followed by mixed-precision SGD. Detailed theoretical analysis is provided to prove the convergence of the proposed mixed-precision algorithm. Numerical experiments on both synthetic and realistic tensor data sets show the superior efficiency of our mixed-precision algorithm compared to full-precision CP decomposition. This work can remarkably reduce the memory, computing and energy cost on resource-constraint edge computing devices. We demonstrate this benefit via an FPGA prototype.

Publication:

arXiv e-prints

Pub Date:

September 2022

DOI:

10.48550/arXiv.2209.04003

arXiv:

arXiv:2209.04003

Bibcode:

2022arXiv220904003Y

Keywords:

Mathematics - Optimization and Control

ADS

Hardware-Efficient Mixed-Precision CP Tensor Decomposition

Abstract