ABSTRACT

This chapter discusses two methods (genetic algorithms and simulated annealing) in the context of selecting appropriate subsets of features. There are a variety of other global search methods that can also be used, such as particle swarm optimization and simultaneous perturbation stochastic approximation. The OkCupid data will be used in conjunction with the naive Bayes classification model. The naive Bayes model is computationally efficient and is an effective model for illustrating global search methods. Annealing is the process of heating a metal or glass to remove imperfections and improve strength in the material. The annealing process that happens to particles within a material can be abstracted and applied for the purpose of global optimization. Simulated annealing is a controlled random search; the new candidate feature subset is selected completely at random based on the current state. After a sufficient number of iterations, a data set can be created to quantify the difference in performance with and without each predictor.