ABSTRACT

The breadth of problems that can be solved with data science is astonishing, and this book provides the required tools and skills for a broad audience. The reader takes a journey into the forms, uses, and abuses of data and models, and learns how to critically examine each step. Python coding and data analysis skills are built from the ground up, with no prior coding experience assumed. The necessary background in computer science, mathematics, and statistics is provided in an approachable manner.

Each step of the machine learning lifecycle is discussed, from business objective planning to monitoring a model in production. This end-to-end approach supplies the broad view necessary to sidestep many of the pitfalls that can sink a data science project. Detailed examples are provided from a wide range of applications and fields, from fraud detection in banking to breast cancer classification in healthcare. The reader will learn the techniques to accomplish tasks that include predicting outcomes, explaining observations, and detecting patterns. Improper use of data and models can introduce unwanted effects and dangers to society. A chapter on model risk provides a framework for comprehensively challenging a model and mitigating weaknesses. When data is collected, stored, and used, it may misrepresent reality and introduce bias. Strategies for addressing bias are discussed. From Concepts to Code: Introduction to Data Science leverages content developed by the author for a full-year data science course suitable for advanced high school or early undergraduate students. This course is freely available and it includes weekly lesson plans.

chapter 1|8 pages

Introduction

chapter 3|10 pages

Data Science Project Planning

chapter 4|22 pages

An Overview of Data

chapter 5|26 pages

Computing Preliminaries and Setup

chapter 6|24 pages

Data Processing

chapter 7|22 pages

Data Storage and Retrieval

chapter 8|38 pages

Mathematics Preliminaries

chapter 9|16 pages

Statistics Preliminaries

chapter 10|16 pages

Data Transformation

chapter 11|22 pages

Exploratory Data Analysis

chapter 12|26 pages

An Overview of Machine Learning

chapter 13|24 pages

Modeling with Linear Regression

chapter 14|20 pages

Classification with Logistic Regression

chapter 15|18 pages

Clustering with K-Means

chapter 16|12 pages

Elements of Reproducible Data Science

chapter 17|24 pages

Model Risk

chapter 18|14 pages

Next Steps