ABSTRACT

Over the last decade, high-utility pattern mining has become an emerging research topic in the field of data mining. There is a strong need for scalable and efficient mining techniques in this area as the size of data increases gigantically. For example, on YouTube, 48 h of video is uploaded every minute; there are currently 1.97 billion Internet users worldwide, and the unstructured data are growing at a rate of 80% per year. The high-utility pattern mining may be considered as an extension of frequent-pattern mining in which the frequency of the itemsets’ occurrence is considered. In some cases, the frequent itemsets may only contribute a small portion of the overall profit, whereas the nonfrequent itemsets may contribute a large portion of profit. High-utility pattern mining discovers more valuable knowledge from the transaction databases by considering different values of individual items as utilities. Utility mining is more complex than frequent-pattern mining. It has many applications in retail-chain data analysis, online analytical processing, network traffic analysis, web-server log and click-stream mining, telecommunication data analysis, e-business and stock data analysis, sensor network data analysis, and so forth.

In this chapter, we select four significant algorithms in this area, namely, generation of temporal maximal utility itemsets from data streams using landmark window (GUIDE [LM]), high-utility itemset miner (HUI-Miner), high-utility mining using maximal itemset property (UMMI), and Two-Phase algorithm, based on the following concepts: number of database scans, pruning strategies, and summary structures. A case study on the retail data set with real application is also presented. The main purpose of this chapter is to show the performance and usefulness of the chosen algorithms based on analysis of the retail data set in order to design more efficient algorithms. Moreover, it will be useful to researchers interested in the area to learn why all past attempts have failed to discover high-utility patterns.