ABSTRACT

Let us look at a concrete example to illustrate the problem that a regular expression can solve. You want to build a spam lter. You need to come up with a way to lter out any email message that contains the word Viagra. Searching for the literal string “Viagra” or “viagra” is easy enough. But, clever spammers will try to disguise the word by using substitution letters, adding characters or removing letters in the word. We might expect any of the following variations:

• v.i.a.g.r.a

• v1agra

• vi_ag_ra

• vi@gr@

• ViAgRa

• Viagr

To build our spam lter, we would not want to list every possible potential appearance of Viagra. For the six examples listed here, there are thousands of others that someone might come up with. Instead, we could dene a single regular expression that could cover many or most of the possibilities.