For understanding generic documents, information like font sizes, column layout, and generally the positioning of words may carry semantic information that is crucial for solving a downstream document intelligence task. Our novel BERTgrid, which is based on Chargrid by Katti et al. (2018), represents a document as a grid of contextualized word piece embedding vectors, thereby making its spatial structure and semantics accessible to the processing neural network. The contextualized embedding vectors are retrieved from a BERT language model. We use BERTgrid in combination with a fully convolutional network on a semantic instance segmentation task for extracting fields from invoices. We demonstrate its performance on tabulated line item and document header field extraction.
- Pub Date:
- September 2019
- Computer Science - Computation and Language;
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Machine Learning
- 4 pages, accepted at the "Document Intelligence" workshop of 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada