ABSTRACT

Protein-protein interactions play important roles in most fundamental cellular processes. It is important to develop effective statistical approaches to predicting protein interactions based on recently available high throughput experimental data. Since protein domains are the functional units of proteins and protein-protein interactions are mostly achieved through domain-domain interactions, the modeling and analysis of protein interactions at the domain level may be more informative. However, due to the large number of domains, the number of parameters to be estimated is very large, whereas the number of observed protein-protein interactions is relatively small. Hence the amount of information for statistical inference is quite limited. In this chapter we describe a Bayesian method for simultaneously estimating domain-domain interaction probabilities, the false positive rate, and the false negative rate of high-throughput data through integrating data from several organisms. Since we expect the domain and protein interaction networks to be sparse, a pointmass mixture prior is applied to incorporate network sparsity. We compare the prediction results between models with and without sparsity prior using high throughput yeast two-hybrid data from different organisms. Our results clearly demonstrate the advantages of the Bayesian approach with a sparsity prior in modeling and predicting protein interaction data.