It has long been pointed out by corpus linguists working with spoken data that the lack of audio and video leads to problems in the analysis of corpus data. De Cock (1998), for example, in a discussion of the sequence ‘you know’, argues that it is virtually impossible to decide whether ‘you know’ has a literal or a formulaic meaning on the basis of the orthographic transcript alone. Much depends on the prosody and placement of this sequence which often assumes a phraseological meaning.