ABSTRACT

Classifiers such as linear discriminant analysis assume a linear separation boundary between the classes. Kernel-based classifiers are based on the principle that by applying some kind of transformation of the data, a new data space is created in which the nonlinear separation boundary is transformed into a linear one. There are a lot of applications of kernel-based techniques for the prediction of retention indices of series of compounds on certain liquid chromatography or gas chromatography columns. The concept of a classification tree is rather intuitive; the challenge is to construct such tree on a real problem. Classification trees are, as other supervised methods, constructed through a training phase. The general idea is to construct hundreds of classification trees on different random subsets of the training data. Taking these trees altogether, the chance for overfitting is drastically reduced. This method is logically called random forest. The ultimate goal of a random forest is to predict to class labeling for new samples.