Chapter 5 Feature Selection and Classification in the Diagnosis of Cervical Cancer Jennifer Hallinan Institute for Molecular Biosciences University of Queensland St. Lucia, Brisbane, Australia 4072

[email protected]

5.1 Introduction Cervical cancer is one of the most common cancers, accounting for 6% of all malignancies in women (National Cancer Institute, 1999). The standard screening test for cervical cancer is the Papanicolaou (or “Pap”) smear, which involves visual examination of cervical cells under a microscope for evidence of abnormality (Mackay, Beischer, Cox & Wood, 1983). Pap smear screening is labour-intensive and boring, but requires high precision, and so appears on the surface to be extremely suitable for automation. Research has been done in this area since the late 1950s (Husain & Watts, 1988); it is one of the “classical” problems in automated image analysis (see also Banda-Gamboa, Ricketts, Cairns, Hussein, Tucker & Husain, 1992; Bartels, 1992 and Danielson, Kanagasingam, Jirgensen, Reith & Nesland, 1994). It was initially assumed that an automated system would operate in essentially the same way as an expert human cytologist, scanning slides visually, and looking for the same changes in cells that the human would detect. Unfortunately, progress has been slow. Abnormal cells may represent only a few of the thousands of cells on a slide; and they may be difficult or impossible for a machine vision system to detect amid the irregular clumps of different types of cells and scattered debris commonly present on a Pap smear slide. In the last four decades or so, with the advent of powerful, reasonably priced computers and sophisticated algorithms, an alternative to the identification of malignant cells on a slide has become possible. This is the detection of so-called Malignancy Associated Changes (MACs) – subvisual alterations to the texture of apparently normal cells from the vicinity of a cancerous or precancerous lesion. The approach to MAC detection generally used is to capture digital images of visually normal cells from patients of known diagnosis (cancerous/precancerous

condition or normal). A variety of features such as nuclear area, optical density, shape and texture features are then calculated from the images, and linear discriminant analysis is used to classify individual cells as either “normal” or “abnormal.” An individual is then given a diagnosis on the basis of the proportion of abnormal cells detected on her Pap smear slide (Figure 5.1).