Named Entity Annotation Schema for Geological Literature Mining in the Domain of Porphyry Copper Deposits
Abstract
With the development of natural language processing and deep learning, geological text data have become a vital resource and have attracted the attention of publishers, academic organizations, and domain scientists. The information extraction from unstructured literature is an ongoing challenge. What kind of information needs to be extracted is a fundamental issue. This paper presents a workflow that uses the case-driven method to build the ontology model of porphyry copper deposit and entity annotation schema. First, we select the Dexing porphyry copper deposit as a case to drive the construction of the ontology model. The text data associated with the Dexing porphyry copper deposit provides a series of entity instances. The entity classes of instances are then derived from the instances based on the knowledge of the mineral deposit model. Second, a named entity annotation schema including 21 entity tokens is designed based on the ontology model to represent core entity information and used for the text mining in the domain of porphyry copper deposits. Third, based on the annotation schema, a draft corpus with more than 200, 000 words and a finely corrected corpus with 53, 339 words are built for geological entity recognizer training from the published literature associated with porphyry copper deposits. The performance of the geological entity recognizer and the statistical distribution of entity in the corpus prove that the workflow proposed in this study is an effective way to design entity annotation schema and facilitate text data mining.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2022
- Bibcode:
- 2022AGUFMIN12C0276W