ABSTRACT
Given gene expression data from some set of experimental conditions, one of the
first questions that the analysts wants to answer is which genes differ across con-
ditions. At first glance, many of these questions can be dealt with most simply by
conducting an analysis on a gene by gene basis using some spot level summary such
as RMA. For the simplest sort of analysis, suppose we have a certain number of bi-
ological replicates from 2 distinct classes and want to address the question of which
genes differ between the 2 classes. Anyone with some training in statistics would
recognize this as the classical 2 sample problem and would know how to proceed.
One would perhaps examine the distribution of the measurements within each class
using a histogram to assess if the data depart strongly from normality and consider
appropriate transformations of the data or perhaps even a non-parametric test (such
as the Wilcoxon test). One would perhaps assess if the variances were unequal and
perhaps correct for this by using Welch’s modified 2 sample t-test, otherwise one could use the usual common variance 2 sample t-test. While there is some evidence that this is not optimal, such a procedure would be appropriate. As an example, con-
sider Figure 13.1. This figure displays a histogram of the p-values one obtains if one uses a standard 2 sample t-test to test for differences in gene expression between a set of HIV negative patients and a set of HIV positive patients for all genes on an
Affymetrix microarray. The tissue used in this application was taken from the lymph
nodes of the patients, and we expect large differences between HIV negative and HIV
positive patients in this organ because HIV attacks cells that are important for normal
functioning of the immune system. If no genes are differentially expressed, then we
expect this figure to look like a histogram of a uniform distribution, however that
appears to not be the case since many genes have small p-values (for this example, over 12,000 genes have p-values that are less than 0.05).