ABSTRACT

This chapter provides an overview of the various supervised learning techniques, better known as classification techniques, and their application in the field of bioinformatics. The statistical conclusions drawn from the experiments reinforced the hypothesis that both linear and nonlinear classifiers performs better as train data are increases. High variances in learning methods are prone to overfitting training data, which may prevent from capturing true properties of the underlying distribution. Support Vector Machines (SVMs) are bases on two key concepts, the margin of separation and kernel functions. In a linearly separable dataset, a hyperplane correctly classifies all data points, and there may be many separating hyperplanes. The data used in the study consist of protein domains that belong to the different superfamilies defined by the Structural Classification of Proteins (SCOP) version. The chapter covered important concepts of supervised learning such as bias and variance and model complexity.