ABSTRACT

The form of texts can certainly be analyzed statistically, but in this chapter we seek to know how much this tells us about the content of texts.

We can statistically fingerprint texts, and in studies of authorship this can be beneficial. It is well known that studies of authorship can be addressed in this way. Some work of this sort is included in the survey of correspondence analysis of text, which comprises the first half of this chapter. However a text fingerprint is rarely of equal importance in forensic science compared to a voice “fingerprint,” or a shoeprint, or any one of many other author indicators. From this we draw one conclusion: text is of little use in forensics because its availability – the number of words or characters in the text – is usually very limited. The benefits of automated statistical analysis of text come from having large quantities of text.