ABSTRACT

In 1887 the polymath T. C. Mendenhall published an article in Science titled “The Characteristic Curves of Composition,” which is one of the earliest examples of quantitative stylistics but also presents one of the first text visualizations. Mendenhall thought that different authors would have distinctive curves of word length frequencies that could help with authorship attribution, much like a spectroscope could be used to identify elements. In Figure 1.1 you see an example of the characteristic curve of Oliver Twist. Mendenhall counted the length in characters of each of the first 1,000 words and then graphed the number of words of each length. Thus one can see that there are just under fifty words of one letter length in the first one thousand words.