From Word Segmentation to POS Tagging for Vietnamese

doi:10.48550/arXiv.1711.04951

From Word Segmentation to POS Tagging for Vietnamese

This paper presents an empirical comparison of two strategies for Vietnamese Part-of-Speech (POS) tagging from unsegmented text: (i) a pipeline strategy where we consider the output of a word segmenter as the input of a POS tagger, and (ii) a joint strategy where we predict a combined segmentation and POS tag for each syllable. We also make a comparison between state-of-the-art (SOTA) feature-based and neural network-based models. On the benchmark Vietnamese treebank (Nguyen et al., 2009), experimental results show that the pipeline strategy produces better scores of POS tagging from unsegmented text than the joint strategy, and the highest accuracy is obtained by using a feature-based model.

Publication:

arXiv e-prints

Pub Date:

November 2017

DOI:

10.48550/arXiv.1711.04951

arXiv:

arXiv:1711.04951

Bibcode:

2017arXiv171104951N

Keywords:

Computer Science - Computation and Language

E-Print:

To appear in Proceedings of the 15th Annual Workshop of the Australasian Language Technology Association, ALTA 2017

NASA/ADS

From Word Segmentation to POS Tagging for Vietnamese

Abstract