ABSTRACT

This chapter sets out the basic ideas and provides some illustrations of them for cybersecurity. Data analytics usually refers to the analysis of data for business applications. Exploratory data analysis (EDA) is the most important aspect to any data analysis. The simplest, most often used, and in many ways most powerful EDA tool is the simple scatterplot. Clustering, clustering or unsupervised learning, takes unlabeled data and returns a grouping of the data. One popular method for reducing the dimensionality of a data set is through random projections. The idea is to take random linear combinations of the features and use these in place of the original features. Thus data analytics provides algorithms for finding patterns, both “normal” and “unusual”, and tools for determining whether these patterns are likely to be spurious – the result of random variation – or indicative of something different in the data. It also provides tools for processing massive amounts of data, and very high-dimensional data.