Multi-Unit Directional Measures of Association: Moving Beyond Pairs of Words
Abstract
This paper formulates and evaluates a series of multi-unit measures of directional association, building on the pairwise {\Delta}P measure, that are able to quantify association in sequences of varying length and type of representation. Multi-unit measures face an additional segmentation problem: once the implicit length constraint of pairwise measures is abandoned, association measures must also identify the borders of meaningful sequences. This paper takes a vector-based approach to the segmentation problem by using 18 unique measures to describe different aspects of multi-unit association. An examination of these measures across eight languages shows that they are stable across languages and that each provides a unique rank of associated sequences. Taken together, these measures expand corpus-based approaches to association by generalizing across varying lengths and types of representation.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2021
- DOI:
- 10.48550/arXiv.2104.01297
- arXiv:
- arXiv:2104.01297
- Bibcode:
- 2021arXiv210401297D
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- International Journal of Corpus Linguistics (2018)