ABSTRACT

Statistics and probability are the basis to understand the data science algorithms and implement the data science domain applications. This chapter discusses the fundamental terminology and definitions of data types and variables. It also explains the base statistics required in data science with sampling techniques. The information analysis and computation using information gain theory are elaborated. The probability theory with types and distributions is demonstrated with examples. The Bayes theorem and its use to resolve data science applications are explained. Finally, the inferential statistics concept is introduced to support data science learnings.