Data Stream Classification with Limited Labeled Training Data | 14

Chapter

ABSTRACT

This chapter describes proposed solution for the limited labeled training data. It discusses a description of techniques and explores training with limited labeled data and ensemble classification. ReaSC stands for Realistic Data Stream Classifier. Before describing ReaSC, authors informally define the data stream classification problem. The classification model is a collection of K microclusters obtained using semisupervised clustering. Training consists of two basic steps: semisupervised clustering and storing the cluster summaries as microclusters. It is obvious that when more training instances are provided to a model, its classification error is more likely to reduce. The ensemble training process consists of three main steps: creating clusters using E–M; refining the ensemble; and updating the ensemble. Intuitively, increasing the ensemble size helps to reduce error. Significant improvement is achieved by increasing the ensemble size from 1 to 2. While many of the ensemble approaches are based on supervised learning algorithms, authors’ approach is based on the semisupervised algorithm.