ABSTRACT

This chapter discusses insider threat detection for sequence data and describes classifying sequence data. It explores both supervised and unsupervised learning techniques for mining data streams for sequence data. The chapter explains aspects of anomaly detection and complexity analysis. Insider threat detection-related sequence data is stream-based in nature. Sequence data may be gathered over time, maybe even years. In this case, it is assumed that a data stream will be converted into a number of chunks. For example, each chunk may represent a week and contain the sequence data that arrived during that time period. If there were no concept drift in the data stream, the decision boundary would be the same for both the current chunk and its previous chunk. Data relevant to insider threat is typically accumulated over many years of organization and system operations, and is therefore best characterized as an unbounded data stream. The chapter focuses on time complexity of Quantized Dictionary construction.