Chinese NER Using Lattice LSTM
Abstract
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2018
- DOI:
- 10.48550/arXiv.1805.02023
- arXiv:
- arXiv:1805.02023
- Bibcode:
- 2018arXiv180502023Z
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- Accepted at ACL 2018 as Long paper