A bag-of-concepts model improves relation extraction in a narrow knowledge domain with limited data
Abstract
This paper focuses on a traditional relation extraction task in the context of limited annotated data and a narrow knowledge domain. We explore this task with a clinical corpus consisting of 200 breast cancer follow-up treatment letters in which 16 distinct types of relations are annotated. We experiment with an approach to extracting typed relations called window-bounded co-occurrence (WBC), which uses an adjustable context window around entity mentions of a relevant type, and compare its performance with a more typical intra-sentential co-occurrence baseline. We further introduce a new bag-of-concepts (BoC) approach to feature engineering based on the state-of-the-art word embeddings and word synonyms. We demonstrate the competitiveness of BoC by comparing with methods of higher complexity, and explore its effectiveness on this small dataset.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2019
- DOI:
- arXiv:
- arXiv:1904.10743
- Bibcode:
- 2019arXiv190410743C
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Computation and Language;
- Computer Science - Information Retrieval;
- Statistics - Machine Learning
- E-Print:
- To appear in Proceedings of the Student Research Workshop at the North American Association for Computational Linguistics (NAACL) meeting 2019