ABSTRACT

Computational biology, the application of computational and mathematical techniques to problems inspired by biology, has witnessed an unprecedented expansion over the last few years. The wide availability of genomic, transcriptomic, proteomic, metabolomic, and interactomic data has fostered the development of computational techniques for their analysis. Combinatorial pattern matching is one of the main sources of algorithmic solutions to the problems that arise in their analysis. This is a text on combinatorial pattern matching algorithms, with special

emphasis on computational biology. Pattern matching is well known in computational biology, not only because of biological sequence alignment. Data structures such as suffix trees and suffix arrays were developed within the combinatorial pattern matching research community to efficiently solve specific problems that arise in computational biology. This book provides an organized and comprehensive view of the whole field of combinatorial pattern matching from a computational biology perspective and addresses specific pattern matching problems within and between sequences, trees, and graphs. Much of the material presented on the book is only available in the specialized research literature. The book is structured around the specific algorithmic problems that arise

when dealing with those structures that are commonly found in computational biology, namely: biological sequences (such as DNA, RNA, and protein sequences), trees (such as phylogenetic trees and RNA structures), and graphs (such as phylogenetic networks, metabolic pathways, protein interaction networks, and signaling pathways). The emphasis throughout this book is on the search for patterns within and between biological sequences, trees, and graphs, with the understanding of exact (rather than approximate) occurrences as patterns and pairwise (rather than multiple) comparison of structures. There is also a strong emphasis on phylogenetic trees and networks as examples of trees and graphs in computational biology. For each of these structures (sequences, trees, and graphs), a clear distinc-

tion is made between the problems that arise in the analysis of one structure (finding patterns within a structure) and in the comparative analysis of two or more structures (finding patterns common to two structures and aligning these structures).