ABSTRACT

This chapter addresses the ways in which computers can be used to assist the work of the forensic linguist. The emphasis is on computational document comparison and in particular on the identification of high levels of similarity between the whole or parts of two documents, which the forensic linguist can use to decide questions of shared or suspicious authorship. One of the central problems discussed is how to handle very large quantities of data efficiently and reliably. The need for computer assistance has grown rapidly in the twenty-first century, with most companies and educational institutions holding their data in electronic form and often making it openly available on the Internet. There is both the need to monitor for misuse of such electronic material and for the existence of prior work or duplicated material in databases. Another major area examined is the need for flexibility in any computer program which has the objective of identifying similarity. Identifying consecutive sequences of words only finds unmodified copying (cut and paste), whereas more sophisticated modifications involve insertion, deletion, re-ordering or thesaural changes and all these require word-level searching. Arising from this, the power of using simple lists of particular words or parts of words, rather than undertaking full grammatical parsing, is explored and explained.