Diffusion-based Document Layout Generation
Abstract
We develop a diffusion-based approach for various document layout sequence generation. Layout sequences specify the contents of a document design in an explicit format. Our novel diffusion-based approach works in the sequence domain rather than the image domain in order to permit more complex and realistic layouts. We also introduce a new metric, Document Earth Mover's Distance (Doc-EMD). By considering similarity between heterogeneous categories document designs, we handle the shortcomings of prior document metrics that only evaluate the same category of layouts. Our empirical analysis shows that our diffusion-based approach is comparable to or outperforming other previous methods for layout generation across various document datasets. Moreover, our metric is capable of differentiating documents better than previous metrics for specific cases.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2023
- DOI:
- 10.48550/arXiv.2303.10787
- arXiv:
- arXiv:2303.10787
- Bibcode:
- 2023arXiv230310787H
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14187. Springer, Cham