ABSTRACT

The plaid model is an additive biclustering method (Lazzeroni and Owen, 2002; Turner et al., 2005) that defines the expression level of the ith gene under the jth condition as a sum of biclusters (layers) in the expression matrix. Let Y be N ×M data matrix for which the rows represent genes and the columns conditions. For K biclusters, the gene expression level is expressed as a linear model of the form

Yij = µ0 +

θijkρikκjk + εij , i = 1, . . . , N, j = 1, . . . ,M. (6.1)

Here, µ0 is an overall effect and εij is a Gaussian error with mean zero and variance σ2. As pointed out by Turner et al. (2005), the background effect is not necessarily constant and can be equal to µ0 + αi0 + βj0. The parameters ρik and κjk are binary parameters that represent the membership of the gene/condition in bicluster k in the following way:

ρik =

{ 1 gene i belongs to bicluster k, 0 otherwise,

(6.2)

and

κjk =

{ 1 condition j belongs to bicluster k, 0 otherwise.