ABSTRACT

Protein identification is a process that involves matching mass spectrometric data with collections of protein sequences. This procedure has become an essential part of proteomics-based biological research.1 The mass spectrometric data is represented by a set of observed signal intensity mass-to-charge ratio pairs. These pairs are compared directly to the set of similar pairs that should be representative of a subset of proteins in the sequence collection. The scores that result from the comparison are then analyzed and clustered to find the best model set of protein sequences that fits the experimental data.