ABSTRACT

We compare two extensions of principal component analysis to distributional variables and we introduce a generalization strategy when the knowledge experts want to perform histogram PCA on several distributional datasets. Actually, most of proposed approaches only consider the situation where users have one dataset and from a technical standpoint, the proposed solutions are based on either the first order moments or the quantiles. In that, we present two Histogram PCAs respectively based on the barycenters of distributions and on the average correlation matrix induced by the quantiles of distributions. We review the benefits and the flaws of using the barycenters versus the quantiles and we present a generalization framework when there are more than one histogram dataset. The methods we described are applicable in many domains like people analytics, risk and control management, internal audit, anti money laundering, healthcare analytics, sports analytics, etc.