ABSTRACT

In this chapter, the authors present their implementation of the sequence comparison algorithm on the data parallel Connection Machine CM-2, manufactured by Thinking Machines Corporation. The comparison of a new sequence of a gene or protein against the database of known sequences has become a standard tool in the analysis of protein structure and function. Sequence similarity between two proteins may permit a function to be ascribed to an uncharacterized gene product, and study of the conserved regions between several related proteins may identify an important structural feature such as an enzyme active site. The authors develop methods for multiple sequence alignment, consensus pattern identification, comparison of protein structures, and the prediction of structure from sequence. Parallel instructions are passed to a sequencer on the CM-2 which then broadcasts appropriate low-level operations to the processors. The appropriate algorithm for searching a database is that of T. F. Smith and Michael S. Waterman which locates the best common subsequence between two sequences.