Translating Math Formula Images to LaTeX Sequences Using Deep Neural Networks with Sequencelevel Training
Abstract
In this paper we propose a deep neural network model with an encoderdecoder architecture that translates images of math formulas into their LaTeX markup sequences. The encoder is a convolutional neural network (CNN) that transforms images into a group of feature maps. To better capture the spatial relationships of math symbols, the feature maps are augmented with 2D positional encoding before being unfolded into a vector. The decoder is a stacked bidirectional long shortterm memory (LSTM) model integrated with the soft attention mechanism, which works as a language model to translate the encoder output into a sequence of LaTeX tokens. The neural network is trained in two steps. The first step is tokenlevel training using the MaximumLikelihood Estimation (MLE) as the objective function. At completion of the tokenlevel training, the sequencelevel training objective function is employed to optimize the overall model based on the policy gradient algorithm from reinforcement learning. Our design also overcomes the exposure bias problem by closing the feedback loop in the decoder during sequencelevel training, i.e., feeding in the predicted token instead of the ground truth token at every time step. The model is trained and evaluated on the IM2LATEX100K dataset and shows stateoftheart performance on both sequencebased and imagebased evaluation metrics.
 Publication:

arXiv eprints
 Pub Date:
 August 2019
 DOI:
 10.48550/arXiv.1908.11415
 arXiv:
 arXiv:1908.11415
 Bibcode:
 2019arXiv190811415W
 Keywords:

 Computer Science  Machine Learning;
 Computer Science  Computer Vision and Pattern Recognition;
 Statistics  Machine Learning
 EPrint:
 11 pages, 4 figures