ABSTRACT

Tumor sequencing data are often characterized by a high degree of theterogeneity because in most cases tumor samples are composed not only of cancerous tissue but also non-cancerous blood vessels, stromal cells, and adjacent tissues. These non-cancerous parts of a tumor are important for cancer biology, should be taken into account in the computational analysis of cancer genomics data, and in many cases are important for the biological and medical interpretation of the results of computational analysis. Additionally, even the cancerous components of a tumor can be heterogeneous. Different subclones of the tumor harbor partially overlapping sets of somatic alterations that may be responsible for the distinct behavior of individual subclones. A number of computational methods have been developed to estimate tumor purity from genomic data. PurBayes can be used to estimate both the tumor purity of a sample and the number of subclonal populations present in a tumor.