ABSTRACT

This chapter presents basics of probability and statistics required for research in cyber security, and for the understanding of cyber security data and analysis methods. Statistics is one of the most important pillars underpinning the analysis of cyber data. It deals with the problem of making inference – making decisions about what is happening, and providing methods for determining how confident one can be about one’s decisions. Statistics is the mathematics, science and art of making reliable inferences and decisions from data. The probability density function is the continuous analog of the probability mass function. Utilizing both parametric and nonparametric models is important, as is investigating different methods of each type. Several statistical models are relevant for cyber security analytics. Regularization refers to adding constraints or additional information to one’s model to reduce over fitting and to obtain more sparse or parsimonious models.