ABSTRACT

This chapter considers three applications of hidden Markov models (HMMs). First, it discusses the use of HMMs in English text analysis. Then it turns attention to an information security application where one can apply HMMs to the malware detection problem. Specifically, the chapter shows that HMMs can be used to detect types of malware that cannot possibly be detected using standard signature-based techniques. Finally, it builds on the English text analysis application to show that HMMs are a powerful tool for breaking classic substitution ciphers. Among other things, this latter example highlights the potential benefit of multiple random restarts when training an HMM. The chapter illustrates the strength of machine learning in the realm of malware detection. Although the viruses analyzed were not detectable using signature-scanning, they were easily distinguished by HMMs. It shows conclusively that a metamorphic generator can evade both signature detection and the HMM-based approach. The chapter also explains Code obfuscation in NGVCK and Signature-proof metamorphic generator.