ABSTRACT

This chapter introduces the concept of a profile hidden Markov model (PHMM). PHMM can be viewed as a series of HMMs where, in effect, we define a new B matrix at each offset in the training data. The chapter considers a few simple examples to illustrate important aspects of PHMM technique. It focuses on sequence alignment, since that is the most challenging aspect of training a PHMM. It also considers realistic applications of PHMMs to problems in information security, including malware detection and masquerade detection. The chapter also considers the process of constructing a multiple sequence alignment (MSA) from a collection of pairwise alignments, and shows how to generate the PHMM matrices from an MSA. It describes the PHMM scoring, which is slightly more complex than scoring with an HMM, due primarily to the greater complexity in the state transitions.