ABSTRACT

Probability and Statistics for Data Science: Math + R + Data covers "math stat"—distributions, expected value, estimation etc.—but takes the phrase "Data Science" in the title quite seriously:

* Real datasets are used extensively.

* All data analysis is supported by R coding.

* Includes many Data Science applications, such as PCA, mixture distributions, random graph models, Hidden Markov models, linear and logistic regression, and neural networks.

* Leads the student to think critically about the "how" and "why" of statistics, and to "see the big picture."

* Not "theorem/proof"-oriented, but concepts and models are stated in a mathematically precise manner.

Prerequisites are calculus, some matrix algebra, and some experience in programming.

Norman Matloff is a professor of computer science at the University of California, Davis, and was formerly a statistics professor there. He is on the editorial boards of the Journal of Statistical Software and The R Journal. His book Statistical Regression and Classification: From Linear Models to Machine Learning was the recipient of the Ziegel Award for the best book reviewed in Technometrics in 2017. He is a recipient of his university's Distinguished Teaching Award.

part I|1 pages

Fundamentals of Probability

chapter Chapter 1|32 pages

Basic Probability Models

chapter Chapter 2|10 pages

Monte Carlo Simulation

chapter Chapter 3|19 pages

Discrete Random Variables: Expected Value

chapter Chapter 4|18 pages

Discrete Random Variables: Variance

chapter Chapter 5|30 pages

Discrete Parametric Distribution Families

chapter Chapter 6|34 pages

Continuous Probability Models

part II|2 pages

Fundamentals of Statistics

chapter Chapter 7|22 pages

Statistics: Prologue

chapter Chapter 8|26 pages

Fitting Continuous Models

chapter Chapter 9|20 pages

The Family of Normal Distributions

chapter Chapter 10|26 pages

Introduction to Statistical Inference

part III|1 pages

Multivariate Analysis

chapter Chapter 11|20 pages

Multivariate Distributions

chapter Chapter 12|9 pages

The Multivariate Normal Family of Distributions

chapter Chpater 13|12 pages

Mixture Distributions

chapter Chapter 14|21 pages

Multivariate Description and Dimension Reduction

chapter Chapter 15|34 pages

Predictive Modeling

chapter Chapter 16|6 pages

Model Parsimony and Overfitting

chapter Chapter 17|15 pages

Introduction to Discrete Time Markov Chains