ABSTRACT

This chapter presents basic descriptive and inferential statistical methods. It introduces the concept of clustering, explaining different approaches such as hierarchical clustering, partitional clustering, spectral clustering, affinity propagation and probabilistic clustering. The chapter focuses on supervised classification methods, illustrating non-probabilistic classifiers –k-nearest neighbors, classification trees, rule induction, artificial neural networks and support vector machines–, and probabilistic classifiers –logistic regression and Bayesian classifiers– as well as metaclassifiers. It reviews Bayesian networks which are solid probabilistic graphical models in dynamic scenarios, a common feature in industry. The chapter also reviews some machine learning computational tools. It focuses on imputation methods able to deal with missing data and also on variable transformation schemes, such as standardization, transformations toward Gaussianity and discretization. The chapter discusses the basic concepts of parameter estimation (parameter point estimation and parameter confidence intervals) and hypothesis testing. It describes the characteristics of some of the most popular machine learning software tools for clustering, supervised classification, Bayesian networks and dynamic environments.