ABSTRACT

The lexical bundles approach is a fully corpus-driven methodology in that it 'with simple word forms and priority to frequency to identify recurrent word sequences'. The bundles approach processes each word in a corpus and identifies and tallies every possible multi-word sequence of a specified length attested in a corpus. Lexical bundle researchers have proposed that these units, which are identified purely based on frequent recurrence in corpora, function as 'important building blocks of discourse, associated with basic communicative functions'. The lexical bundles that have been identified through this corpus-driven methodology typically have characteristics which distinguish them from other types of formulaic language: extremely common, based on distributional criteria representing both frequency of occurrence and distribution across speakers/writers; typically incomplete structural units, often bridging two units; non-idiomatic meaning and not particularly perceptually salient. After their identification in a corpus, lexical bundles are typically described according to their structural makeup and typical discourse functions.