ABSTRACT

The variety of the programs has led to a large accumulation of data, such that the bottleneck of omics approaches nowadays seems to reside in the computational methods to manage, analyze and interpret the emerging data rather than in the high-throughput techniques to generate it. Data-driven models analyze the system’s behavior from a bird’s eye perspective, describing it as one unit. Depending on the application and availability of data, the input variables can describe proteins, mutations, gene expression, copy number alterations, metabolites or other biological and chemical molecule abundances. Support vector machines map the input data into a high-dimensional feature space to find an optimal hyperplane that creates a decision boundary between the different classes. A common property of high-dimensional data spaces is the existence of highly correlated data points, which can exhibit both local and global correlation structures.