ABSTRACT

Early detection is critical in disease control and prevention. Molecular biomarkers provide valuable information about the status of a cell at any given time point. Biomarker research has benefited from recent advances in technologies such as gene expression microarrays, and more recently, proteomics. Motivated by specific problems involving proteomic profiles generated using Matrix-Assisted Laser Desorption and Ionization (MALDI-TOF) mass spectrometry, we propose model-based inference with mixtures of beta distributions for real-time discrimination in the context of protein biomarker discovery. Most biomarker discovery projects aim at identifying features in the biological proteomic profiles that distinguish cancers from normals, between different stages of disease development, or between experimental conditions (such as different treatment arms). The key to our approach is the use of a fully model-based approach, with coherent joint inference across most steps of the analysis. The end product of the proposed approach is a probability model over a list of protein masses corresponding to peaks in the observed spectra, and a probability model on indicators of differential expression for these proteins. The probability model provides a single coherent summary of the uncertainties in multiple steps of the data analysis, including baseline subtraction, smoothing and peak identification. Some ad-hoc choices remain, including some pre-processing and the solution of the label switching problem when summarizing the simulation output.