ABSTRACT

This chapter simulates and analyzes gene expression data in genomics data analysis. Statisticians simulate random numbers representing gene expression to test statistical methods of identifying affected genes. A good simulation is mathematically equivalent to a biologically realistic game. Such a game generates artificial data similarly to how experiments or observational studies generate biological data. The chapter describes simulation algorithms in terms of simple dice games. It explains normal distributions to bridge the two approaches while clarifying the idea of a probability density. In the differentially expressed (DE) games, Lab’s goal is to determine which genes are differentially expressed (affected by the disease). By contrast, in the game called Effects and Estimates (E&E), the goal of Lab is instead to estimate the size of the disease effect, that is, the true mean expression level of each gene.