ABSTRACT

In traditional HUIM, the utility of an item/set is defined as the sum of its utilities in the database. An important problem with this definition is that it does not take itemset length into account. To provide a better assessment of each itemset’s utility, the task of High Average-Utility Itemset Mining (HAUIM) was proposed by Hong et al. (Hong, Lee, & Wang 2009). This measure addresses the bias of traditional HUIM toward larger itemsets, by considering the length of itemsets, and can thus more objectively assess the utility of itemsets. As for traditional HUIM, several algorithms have been designed for HAUIM (Hong, Lee, & Wang 2009, Lan, Hong, & Tseng 2012, Lin, Hong, & Lu 2010). In this paper, we first design an efficient Average-Utility (AU)-list structure and develop

1 INTRODUCTION

Mining Frequent Itemsets (FIs) or Association Rules (ARs) in transactional databases is a fundamental task in Knowledge Discovery in Databases (KDD) (Agrawal & Srikant 1994, Agrawal, Imielinski, & Swami 2005). The most common ways of deriving FIs or ARs from a database are to use a level-wise (Agrawal & Srikant 1994) or a pattern-growth approach (Han, Pei, Yin, & Mao 2004). Traditional algorithms of FIM or ARM only consider, however, occurrence frequencies of items in binary databases. Other important factors such as quantities, profits, and weights of items are not taken into account by traditional FIM and ARM algorithms. Thus, high-utility Itemset Mining (HUIM) has emerged as a critical issue in recent decades, as it can reveal the profitable itemsets in real-world situations (Liu & Qu 2012, Liu, Liao, & Choudhary 2005, Yao, Hamilton, & Butz 2004). HUIM can be considered as an extension of FIM that considers additional information such

an HAUI-Miner algorithm for mining the HAUIs without candidate generation. The key contributions of this paper are threefold.