ABSTRACT

This chapter deals with active feature value acquisition for feature relevance estimation in domains where feature values are expensive to measure. The following two examples motivate our work.

Example 1: Molecular reagents called biomarkers are studied for cancer characterization by testing them on biological (e.g., tissue) samples from patients who have been monitored for several years and labeled according to their cancer relapse and survival status. New biomarkers are tested on these biological samples with the goal of obtaining a subset of biomarkers that characterize the disease. In addition to the relapse and survival information, for each patient, information such as grade of the disease, tumor dimensions, and lymphonode status is also available. That is, the samples are class labeled as well as described by some existing features. The goal is to choose the best subset of new features (biomarkers) among many that are most informative about the class label given the existing features. Since each time a biomarker is tested on a biological sample the sample cannot be used for testing other

biomarkers, it is desirable to evaluate the biomarkers by testing them on as few samples as possible. Once some of the biomarkers are determined to be informative, they can be tested on all the samples. A detailed description of this problem is presented in [20, 8, 12].