ABSTRACT

There are several problems with the current state-of-the-art solutions. First, the predictive accuracy using a proposed solution such as a Markov model is low; for example, the maximum training accuracy is 41%. Second, prediction using association rule mining (ARM) and longest repeating subsequence (LRS) pattern extraction is done by choosing the path with highest probability in the training set; hence, any new surfing path is misclassified because the probability of such a path occurring in the training set is zero. Third, the sparse nature of the user sessions used in training can result in unreliable predictors. Finally, many of the previous methods have ignored domain knowledge as a means of improving prediction. Domain knowledge plays a key role in improving

predictive accuracy because it can be used to eliminate irrelevant classifiers during prediction or reduce their effectiveness by assigning them lower weights. In Part III of this book, we will describe our algorithms for Web page prediction. In this chapter, we will present our hybrid model. For details of related work, we refer the reader to [1, 2].