ABSTRACT

Discovering a hidden structure in data is one of the cornerstones of modern data analysis. Because of the diversity and complexity of modern datasets, this is a very challenging task and the role of efficient algorithms is of paramount importance in this context. The majority of available datasets are in raw and unstructured form, consisting of example points without corresponding labels. A large class of unlabeled datasets can be modeled as samples from a probability distribution over a very large domain. An important goal in the exploration of these datasets is understanding the underlying distributions.