ABSTRACT

This chapter discusses the decision tree model, which is based on a different approach to classification and classifiers and is especially appealing when the input data is not vectorial. The decision tree model is one of the very few machine learning techniques that can accommodate the special nature of protein data and handle a mix of features of different types, including binary and numerical. Decision trees are especially useful when some or all of the data features are nominal, i.e., when there is no natural notion of similarity between objects. The tree structure that is learned from the data incorporates features that maximize the accuracy when predicting the class membership. Hence, it highlights properties that are strongly correlated with functional aspects of the protein family being modeled. The chapter also discusses protein representation under the decision tree model and feature extraction and processing.