Word frequency-rank relationship in tagged texts
Abstract
We analyze the frequency-rank relationship in sub-vocabularies corresponding to three different grammatical classes (nouns, verbs, and others) in a collection of literary works in English, whose words have been automatically tagged according to their grammatical role. Comparing with a null hypothesis which assumes that words belonging to each class are uniformly distributed across the frequency-ranked vocabulary of the whole work, we disclose statistically significant differences between the three classes. This results point to the fact that frequency-rank relationships may reflect linguistic features associated with grammatical function.
- Publication:
-
Physica A Statistical Mechanics and its Applications
- Pub Date:
- July 2021
- DOI:
- 10.1016/j.physa.2021.126020
- arXiv:
- arXiv:2102.10992
- Bibcode:
- 2021PhyA..57426020C
- Keywords:
-
- Frequency-rank statistics;
- Grammatical function;
- Linguistic regularities;
- Language processing;
- Quantitative linguistics;
- Computer Science - Computation and Language
- E-Print:
- doi:10.1016/j.physa.2021.126020