ABSTRACT

In formal language theory, a regular expression (regex) represents a (potentially infinite-size) set of strings which constitute a regular language (Aho and Ullman, 1972). In practice, regular expression matching (REM) is the process where a stream of characters is searched for occurrences of all possible strings represented by some regex; we say the regex is matched against the input stream. REM has traditionally played a key role in text processing and database filtering. More recently, it has become an essential component in network intrusion detection systems such as Bro (Paxson, 1999) and Snort (Sourcefire, 2011) for deep packet inspection (DPI), as well as in bioinformatics for matching biological sequences (Betel and Hogue, 2002; Fourment and Gillings, 2008).