Extracting Tables from Documents using Conditional Generative Adversarial Networks and Genetic Algorithms
Abstract
Extracting information from tables in documents presents a significant challenge in many industries and in academic research. Existing methods which take a bottom-up approach of integrating lines into cells and rows or columns neglect the available prior information relating to table structure. Our proposed method takes a top-down approach, first using a generative adversarial network to map a table image into a standardised `skeleton' table form denoting the approximate row and column borders without table content, then fitting renderings of candidate latent table structures to the skeleton structure using a distance measure optimised by a genetic algorithm.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2019
- DOI:
- 10.48550/arXiv.1904.01947
- arXiv:
- arXiv:1904.01947
- Bibcode:
- 2019arXiv190401947L
- Keywords:
-
- Computer Science - Neural and Evolutionary Computing
- E-Print:
- 8 pages, 5 figures. Published at IJCNN 2019