Markov Chain Models for Type-Token Relationships

doi:10.1201/9781003066606-8

ABSTRACT

This chapter is concerned with the fit of some Markov chain models to the type-token relationship in literary texts. One of the more interesting problems of statistical linguistics is the type-token relationship, which is concerned with the number V_n of different words (types) appearing in a text of n words. The chapter describes a maximum likelihood approach to estimate the model parameters; since the distributional approximations yield Gaussian distributions, the method is similar to a weighted least squares estimation procedure. Modeling arguments suggest that it may be interpreted as the body of “functional” words i.e., pronouns, prepositions, conjunctions, auxiliary verbs, articles, and interjections that tend to be independent of context and that are likely to be used by all authors writing in the language concerned.