ABSTRACT

CONTENTS 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

This paper discusses a classification system for the detection of various chemical warfare agents. The data were collected as part of the Shipboard Automatic Liquid (Chemical) Agent Detection (SALAD) system. This system is designed to detect chemical agents onboard naval vessels. We explore the intricacies associated with the construction of various classification systems. Along the way we take time to explore some applications of recently developed statistical procedures in visualization and density estimation to this discriminant analysis problem. We focus our discussion on all phases of the discriminant analysis problem. In the exploratory data analysis phase we provide results that detail the use of histograms, scatter plots and parallel coordinate plots for the selection of feature subsets that are fortuitous to the discriminant analysis problem and the discernment of high dimensional data structure. In the discriminant analysis phase we discuss several semiparametric density estimation procedures along with classical kernel, classification and regression trees, and k-nearest-neighbors based approaches. These discussions include some illustrations of the use of a new parallel coordinates framework for the visualization of high dimensional mixture models. We close our discussions with a comparison of the performance of the various techniques through a study of the associated confusion matrices.