ABSTRACT

This chapter discusses the evolutionary mechanisms giving rise to changes in protein sequences and the challenges faced when aligning protein sequences. It considers accurate methods for automatically aligning protein sequences, based on dynamic programming. The chapter assesses the performance of the existing database search methods in recognizing proteins which are homologs, i.e. evolutionarily related sequences. It also considers the challenges and strategies involved in multiple sequence alignment. The chapter describes the concepts underlying automated methods for comparing protein sequences and also discusses the reliability of these methods. The best way of understanding how dynamic programming works is to consider a simple computer implementation of it applied to sequence alignment. The chapter also describes the percentage of relatives identified by any of the sequence search methods that can also be increased by scanning against protein family libraries or intermediate sequence libraries rather than scanning the basic sequence repositories like GenBank, EMBL or Trembl.