Classiﬁcation and Disease Prediction via Mathematical Programming ............................................................. eva k. lee and tsung-lin wu
Abstract In this chapter, we present classiﬁcation models based on mathematical programming approaches. We ﬁrst provide an overview on various mathematical programming approaches, including linear programming, mixed integer programming, nonlinear programming, and support vector machines. Next, we present our effort of novel optimization-based classiﬁcation models that are general purpose and suitable for developing predictive rules for large heterogeneous biological and medical datasets. Our predictive model simultaneously incorporates (1) the ability to classify any number of distinct groups; (2) the ability to incorporate heterogeneous types of attributes as input; (3) a high-dimensional data transformation that eliminates noise and errors in biological data; (4) the ability to incorporate constraints to limit the rate of misclassiﬁcation, and a reserved-judgment region that provides a safeguard against overtraining (which tends to lead to high misclassiﬁcation rates from the resulting predictive rule); and
(5) successive multistage classiﬁcation capability to handle data points placed in the reserved-judgment region. To illustrate the power and ﬂexibility of the classiﬁcation model and solution engine, and its multigroup prediction capability, application of the predictive model to a broad class of biological and medical problems is described. Applications include the differential diagnosis of the type of erythemato-squamous diseases; predicting presence/absence of heart disease; genomic analysis and prediction of aberrant CpG island meythlation in human cancer; discriminant analysis of motility and morphology data in human lung carcinoma; prediction of ultrasonic cell disruption for drug delivery; identiﬁcation of tumor shape and volume in treatment of sarcoma; multistage discriminant analysis of biomarkers for prediction of early atherosclerois; ﬁngerprinting of native and angiogenic microvascular networks for early diagnosis of diabetes, aging, macular degeneracy, and tumor metastasis; prediction of protein localization sites; and pattern recognition of satellite images in classiﬁcation of soil types. In all these applications, the predictive model yields correct classiﬁcation rates ranging from 80 to 100 percent. This provides motivation for pursuing its use as a medical diagnostic, monitoring, and decision-making tool.