Conclusion for Part VI | 34 | Data Mining Tools for Malware Detection

ABSTRACT

We have presented a novel technique to detect new classes in concept-drifting data streams. Most of the novelty detection techniques either assume that there is no concept-drift or build a model for a single “normal” class and consider all other classes as novel. But our approach is capable of detecting novel classes in the presence of concept-drift, even when the model consists of multiple “existing” classes. In addition, our novel class detection technique is non-parametric, meaning it does not assume any speci’c distribution of data. We also show empirically that our approach outperforms the state-of-the-art data stream-based novelty detection techniques in both classi’cation accuracy and processing speed. It might appear to readers that to detect novel classes, we are in fact examining whether new clusters are being formed, and therefore, the detection process could go on without supervision. But supervision is necessary for classi’cation. Without external supervision, two separate clusters could be regarded as two di¡erent classes, although they are not. Conversely, if more than one novel class appears in a chunk, all of them could be regarded as a single novel class if the labels of those instances are never revealed. In the future, we would like to apply our technique in the domain of multiple-label instances.