ABSTRACT

CONTENTS 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.2 Tree-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.2.1 Tree-Based Methods for Survival Data . . . . . . . . . . . . . . . . . . . . . . . . 81 3.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3.3.1 Neural Networks for Survival Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.4 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3.4.1 Random Forest for Survival Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.5 Logic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.6 Detroit Breast Cancer Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Machine learning methods have become increasingly popular as powerful analytic tools for exploring complex data structures. The applications of these methods are far reaching. The best documented, and arguably most popular uses of machine learning methods are in biomedical research where classification is a central issue. For example, a clinician may be interested in the following question: Does this patient with an enlarged prostate gland have prostate cancer, or does he simply have a benign disease of the prostate? To answer this question, various clinical information on the patient must be collected, and a good diagnostic test utilizing such information must be in place. The goal of machine learning methods is to provide a solution for constructing suchdiagnostic tests. For applications ofmachine learningmethods in molecular biology and genomics, see Meller andWagner (2007) in this volume.