This textbook shows how to bring theoretical concepts from finance and econometrics to the data. Focusing on coding and data analysis with R, we show how to conduct research in empirical finance from scratch. We start by introducing the concepts of tidy data and coding principles using the tidyverse family of R packages. Code is provided to prepare common open-source and proprietary financial data sources (CRSP, Compustat, Mergent FISD, TRACE) and organize them in a database. We reuse these data in all the subsequent chapters, which we keep as self-contained as possible. The empirical applications range from key concepts of empirical asset pricing (beta estimation, portfolio sorts, performance analysis, Fama-French factors) to modeling and machine learning applications (fixed effects estimation, clustering standard errors, difference-in-difference estimators, ridge regression, Lasso, Elastic net, random forests, neural networks) and portfolio optimization techniques.


1. Self-contained chapters on the most important applications and methodologies in finance, which can easily be used for the reader’s research or as a reference for courses on empirical finance.

2. Each chapter is reproducible in the sense that the reader can replicate every single figure, table, or number by simply copying and pasting the code we provide.

3. A full-fledged introduction to machine learning with tidymodels based on tidy principles to show how factor selection and option pricing can benefit from Machine Learning methods.

4. Chapter 2 on accessing and managing financial data shows how to retrieve and prepare the most important datasets financial economics: CRSP and Compustat. The chapter also contains detailed explanations of the most relevant data characteristics.

5. Each chapter provides exercises based on established lectures and classes which are designed to help students to dig deeper. The exercises can be used for self-studying or as a source of inspiration for teaching exercises.

part 1|20 pages

Getting Started

chapter 21|18 pages

Introduction to Tidy Finance

part II|48 pages

Financial Data

chapter 222|12 pages

Accessing & Managing Financial Data

chapter 3|20 pages

WRDS, CRSP, and Compustat

chapter 4|10 pages


chapter 5|4 pages

Other Data Providers

part III|64 pages

Asset Pricing

chapter 706|16 pages

Beta Estimation

chapter 7|12 pages

Univariate Portfolio Sorts

chapter 8|12 pages

Size Sorts and p-Hacking

chapter 9|8 pages

Value and Bivariate Sorts

chapter 10|8 pages

Replicating Fama and French Factors

chapter 11|6 pages

Fama-MacBeth Regressions

part IV|58 pages

Modeling & Machine Learning

chapter 13412|10 pages

Fixed Effects and Clustered Standard Errors

chapter 13|12 pages

Difference in Differences

chapter 14|22 pages

Factor Selection via Machine Learning

chapter 15|12 pages

Option Pricing via Machine Learning

part V|32 pages

Portfolio Optimization

chapter 19216|12 pages

Parametric Portfolio Policies

chapter 17|18 pages

Constrained Optimization and Backtesting