Knowledge distillation for optimization of quantized deep neural networks

doi:10.48550/arXiv.1909.01688

Knowledge distillation for optimization of quantized deep neural networks

Knowledge distillation (KD) is a very popular method for model size reduction. Recently, the technique is exploited for quantized deep neural networks (QDNNs) training as a way to restore the performance sacrificed by word-length reduction. KD, however, employs additional hyper-parameters, such as temperature, coefficient, and the size of teacher network for QDNN training. We analyze the effect of these hyper-parameters for QDNN optimization with KD. We find that these hyper-parameters are inter-related, and also introduce a simple and effective technique that reduces \textit{coefficient} during training. With KD employing the proposed hyper-parameters, we achieve the test accuracy of 92.7% and 67.0% on Resnet20 with 2-bit ternary weights for CIFAR-10 and CIFAR-100 data sets, respectively.

Publication:

arXiv e-prints

Pub Date:

September 2019

DOI:

10.48550/arXiv.1909.01688

arXiv:

arXiv:1909.01688

Bibcode:

2019arXiv190901688S

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

NASA/ADS

Knowledge distillation for optimization of quantized deep neural networks

Abstract