NaturalProofs: Mathematical Theorem Proving in Natural Language
Abstract
Understanding and creating mathematics using natural mathematical language  the mixture of symbolic and natural language used by humans  is a challenging and important problem for driving progress in machine learning. As a step in this direction, we develop NaturalProofs, a multidomain corpus of mathematical statements and their proofs, written in natural mathematical language. NaturalProofs unifies broad coverage, deep coverage, and lowresource mathematical sources, allowing for evaluating both indistribution and zeroshot generalization. Using NaturalProofs, we benchmark strong neural methods on mathematical reference retrieval and generation tasks which test a system's ability to determine key results that appear in a proof. Largescale sequence models show promise compared to classical information retrieval methods, yet their performance and outofdomain generalization leave substantial room for improvement. NaturalProofs opens many avenues for research on challenging mathematical tasks.
 Publication:

arXiv eprints
 Pub Date:
 March 2021
 arXiv:
 arXiv:2104.01112
 Bibcode:
 2021arXiv210401112W
 Keywords:

 Computer Science  Information Retrieval;
 Computer Science  Machine Learning