ABSTRACT

Authorship attribution studies, which aim to discover the 'stylistic fingerprints' of writers by quantifying a range of features in their writing, have begun to use tag sequences in addition to word frequency counts. Tag sequences are also making an entry into the field of literary text analysis. Tag sequences can also help uncover English as a Foreign Language (EFL) learner fingerprints. Standard UNIX tools were used to extract the tag trigrams from the tagged corpora. The analysis is based on four similar-sized corpora of words. Three of the corpora are from the International Corpus of Learner English (ICLE) database and contain argumentative essay writing by Dutch (DU), Finnish (FI) and French-speaking (FR) advanced learners of English. The fourth corpus is a native speaker corpus, extracted from the Louvain Corpus of Native English Essays (LOCNESS) database, and covers the same type of writing by American students (NS).