ABSTRACT
The likelihood function p(Y|X, b, σ2) expresses how well the data D = {X,Y} is being fitted by the parameters, which is given as (Yuen, 2010):
N( | , , ) /X b σ
σ
1 2
( )πσ 2 − ( )Y Xb ( )Y Xb−⎡⎣⎢
⎤ ⎦⎥
exp T (3)
The Automatic Relevance Determination (ARD) prior is introduced as follows (Mackay, 1992; Tipping 2004):
p G( )b| ( )⎡⎣ ⎤⎦−( |b )A, α0 1 (4) where A( )α = …diag{ , , , }α α1 2 Nb is the precision matrix and α = …⎡⎣ ⎤⎦α α α1 2 , , Nb
T is the hyperparameter vector parameterizing the prior Probability Density Function (PDF) of the weight parameter vector b.