ABSTRACT

Various factors are responsible for the development of different types and subtypes of cancer such as the tissue type, primary site of occurrence, accumulation of sequence, and structural variations. Identifying the cancer subtype is the first step in designing an individualized treatment plan and precision medicine. In the past two decades considerable research efforts have gone into classifying cancer subtypes and discovery of novel classes. Machine learning (ML) approaches have shown promise with increase in the availability of transcriptomic data. In this review we introduce various aspects of ML approaches and discuss a few important applications in this area. Feature selection and data imbalance are two major hurdles faced in the cancer classification problem. The reason for this is that the number of features (genes) is much larger than available patient samples, and some subtypes are rare in the population with a very limited number of samples available, making it difficult to apply ML approaches. Another important aspect that needs to be addressed is interpretability when applied to a clinical setting. In this chapter various methods for feature selection, binary and multi-class classification of cancer samples, integration of different types of biological data, and/or different types of algorithms for improving classification accuracy and interpretability issues are presented.