ABSTRACT

This chapter introduces the probability concepts in probability theory needed for machine learning. These include independence, rules of probability, Simpson’s paradox, probability mass and density functions, cumulative distribution functions, and the definitions of expectation, variance and moments. Discretization is one way of dealing with probabilities of a continuous real-valued variable. The chapter describes commonly used probability distributions starting with discrete variables. The moments can be used to inform the choice of which probability density function should be chosen for the distribution of a random variable. The chapter ends with graphical representations of random variables and their dependencies and parameters. In a graphical representation, each node represents a random variable, which could be a vector, group of random variables, or parameter or group of parameters. The nodes of random variables are circles, while the nodes of parameters are clear.