Word Length Frequency and Distribution in English: Observations, Theory, and Implications for the Construction of Verse Lines
Abstract
Recent observations in the theory of verse and empirical metrics have suggested that constructing a verse line involves a pattern-matching search through a source text, and that the number of found elements (complete words totaling a specified number of syllables) is given by dividing the total number of words by the mean number of syllables per word in the source text. This paper makes this latter point explicit mathematically, and in the course of this demonstration shows that the word length frequency totals in English output are distributed geometrically (previous researchers reported an adjusted Poisson distribution), and that the sequential distribution is random at the global level, with significant non-randomness in the fine structure. Data from a corpus of just under two million words, and a syllable-count lexicon of 71,000 word-forms is reported. The pattern-matching theory is shown to be internally coherent, and it is observed that some of the analytic techniques described here form a satisfactory test for regular (isometric) lineation in a text.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 1998
- DOI:
- arXiv:
- arXiv:cmp-lg/9808004
- Bibcode:
- 1998cmp.lg....8004A
- Keywords:
-
- Computation and Language;
- Computer Science - Computation and Language
- E-Print:
- 32 pages, 11 figures, uses epsf.tex