ABSTRACT

Keywords: Frequent patterns, Uncertain data, Vertical mining, Tidset, Diset, Association rules, Data mining

Introduction Frequent pattern mining has been a focused theme in data mining research for over a decade. It is a core technique used in many mining tasks like sequential pattern mining, structured pattern mining, correlation mining, associative classication, and frequent pattern-based clustering (C. Zhu, X. Zhang, J. Sun, and B. Huang, 2009), as well as their broad applications (H. Kriegel, P. Kroger and A. Zimek, 2009) (Y. Koh, N. Rountree, R. O’Keefe, 2008) (A.Ceglar and J. Roddick, 2006). So, a great eort has been dedicated to this research and tremendous progress has been made to develop ecient and scalable algorithms for frequent pattern mining (A. Ceglar and J. Roddick, 2006) (Z. Zheng, R. Kohavi, and L. Mason, 2001) (J. Han, H. Cheng, D. Xin, and X. Yan, 2007). All these algorithms deal with precise data sets (J. Han, J. Pei, Y. Yin and R. Mao, 2004) (M. Zaki, 2000) (P. Shenoy, J. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa and D. Shah, 2000) (M. Zaki and K. Gouda, 2003)(W. Consue, and W. Kurutach, 2003) (B. Goethals, 2004)(M. Song, S. Rajasekaran, 2006). Such data is characterized by known and denite existence of the items or events in the transactions. However, there are datasets where the exact existence of items in the transactions cannot be gained. ese datasets are called uncertain data. e existence of an item in a transaction is best captured by a likelihood measure or a probability (Chui, C.-K., Kao, B., Hung, E.). As an example, a medical dataset may contain a table of patients’ records, each of which contains a set of diseases that a patient suers. In such case the physician may highly suspect (but cannot guarantee) that a patient suers from a specic disease. So he expresses his suspection by a probability of the existence of such disease (H., Li, H., Yang, Q. 2007). Another example of uncertain dataset is pattern recognition applications. Given a satellite picture, image processing techniques can be applied to extract features that indicate the presence or absence of certain target objects (such as bunkers). Due to noises and limited resolution, the presence of a feature in a spatial area is often uncertain and expressed as a probability (Dai, X., Yiu, M.L., et al. 2005). Figure 1 shows an example of precise and uncertain data sets. Few algorithms have been dedicated for mining frequent patterns from uncertain data. All these algorithms follow the horizontal data representation.