Scaling Neural Tangent Kernels via Sketching and Random Features

doi:10.48550/arXiv.2106.07880

Scaling Neural Tangent Kernels via Sketching and Random Features

The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely-wide neural networks trained under least squares loss by gradient descent. Recent works also report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets. However, the computational complexity of kernel methods has limited its use in large-scale learning tasks. To accelerate learning with NTK, we design a near input-sparsity time approximation algorithm for NTK, by sketching the polynomial expansions of arc-cosine kernels: our sketch for the convolutional counterpart of NTK (CNTK) can transform any image using a linear runtime in the number of pixels. Furthermore, we prove a spectral approximation guarantee for the NTK matrix, by combining random features (based on leverage score sampling) of the arc-cosine kernels with a sketching algorithm. We benchmark our methods on various large-scale regression and classification tasks and show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.

Publication:

arXiv e-prints

Pub Date:

June 2021

DOI:

10.48550/arXiv.2106.07880

arXiv:

arXiv:2106.07880

Bibcode:

2021arXiv210607880Z

Keywords:

Computer Science - Machine Learning;
Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Data Structures and Algorithms

E-Print:

This is a merger of arXiv:2104.01351, arXiv:2104.00415

ADS

Scaling Neural Tangent Kernels via Sketching and Random Features

Abstract