chapter  18
A Survey of Uncertain Data Clustering Algorithms
ByCharu C. Aggarwal
Pages 26

Many data sets which are collected often have uncertainty built into them. In many cases, the underlying uncertainty can be easily measured and collected. When this is the case, it is possible to use the uncertainty in order to improve the results of data mining algorithms. This is because the uncertainty provides a probabilistic measure of the relative importance of different attributes in data mining algorithms. The use of such information can enhance the effectiveness of data mining algorithms, because the uncertainty provides a guidance in the use of different attributes during the mining process. Some examples of real applications in which uncertainty may be used are as follows:

Imprecise instruments and hardware are sometimes used in order to collect the data. In such 458cases, the level of uncertainty can be measured by prior experimentation. A classic example of such hardware is sensors, in which the measurements are often imprecise.

The data may be input by statistical methods, such as forecasting. In such cases, the uncertainty may be inferred from the methodology used in order to perform the function.

Many privacy-preserving data mining techniques use probabilistic perturbations [11] in order to reduce the fidelity of the underlying data. In such cases, the uncertainty may be available as an end result of the privacy-preservation process. Recent work [5] has explicitly connected the problem of privacy-preservation with that of uncertain data mining and has proposed a method which generates data, which is friendly to the use of uncertain data mining methods.