The likelihood function p(Y|X, b, σ2) expresses how well the data D = {X,Y} is being fitted by the parameters, which is given as (Yuen, 2010):

N( | , , ) /X b σ


1 2

( )πσ 2 − ( )Y Xb ( )Y Xb−⎡⎣⎢

⎤ ⎦⎥

exp T (3)

The Automatic Relevance Determination (ARD) prior is introduced as follows (Mackay, 1992; Tipping 2004):

p G( )b| ( )⎡⎣ ⎤⎦−( |b )A, α0 1 (4) where A( )α = …diag{ , , , }α α1 2 Nb is the precision matrix and α = …⎡⎣ ⎤⎦α α α1 2 , , Nb

T is the hyperparameter vector parameterizing the prior Probability Density Function (PDF) of the weight parameter vector b.