ABSTRACT

The pattern of secondary structure elements of a protein, the sites, if any, where it is posttranslationally modified, the cellular compartments where it resides, and many other functional features are specified by the amino acid sequence. The methods that we describe in this problem all have in common the idea of extracting rules from sets of proteins known to share a specific feature and applying them to the set of unknown cases. The task is to infer one or more rules from a training set composed of proteins sharing a given property. If the rules are sufficiently general, they can be used to predict the presence of the analyzed property in other proteins.