ABSTRACT

Multiple Sequence Alignment (MSA)-based algorithms usually generate more accurate alignments than pairwise methods and can help to discover important mutations. MSAs also play an important role in reconstructing evolutionary relationships between sequences and deciphering the domain structure of proteins. A profile is a concise, efficient and powerful representation of an MSA. It consists of the distributions of amino acids at each position along the MSA and can be used to compute the probability of other sequences belonging to the same family. Usually profiles are converted to position-specific scoring matrices and are used to search sequence databases. Some applications add an entry in the profile for gaps or extend it and include secondary structure information either from known protein structures or predicted ones. Profiles can be used to assign probabilities to sequences that are aligned to the corresponding MSA.