ABSTRACT

Cyber security systems have become of ground zero importance at all levels, inclusive of data, host, network, and application levels. Cyber security refers to a combination of technologies, processes, and operations that are framed to protect information systems, computers, devices, programs, data, and networks from internal or external threats, harm, damage, attacks, or unauthorized access, for example, ransomware or denial of service attacks. Machine learning algorithms as part of artificial intelligence can be clustered into supervised, unsupervised, semisupervised, and reinforcement learning algorithms. The main characteristic of ML is the automatic data analysis of large data sets and the production of models for the general relationships found among data. The research is proposed to evaluate machine learning and big data analytics paradigms for use in cyber security. The pragmatism paradigm, which is congruent with mixed method research (MMR), was used as the research philosophy. Pragmatism epitomizes the congruity between knowledge and action.

Big data has necessitated the development of big data mining tools and techniques widely referred to as big data analytics. The information that is evaluated in big data analytics includes a mix of unstructured and semistructured data, such as social media content, mobile phone records, web server logs, and internet click stream data. Big data analytics makes use of analytic techniques such as data mining, machine learning, artificial learning, statistics, and natural language processing. In analyzing the different data analytics models for cyber security, the researcher makes reference to the characteristics of an ideal data analytics model for cyber security. However, sources of big data about cyber security have been extended to include computer-based data, mobile-based data, physical data of users, human resources data, credentials, one-time passwords, digital certificates, biometrics, and social media data. The basic framework for the big data analytics model for cyber security consists of three major components: big data, analytics, and insights. The characteristics of security data consist of heterogeneous format, diverse semantic, and correlation across data sources, and they are classified into categories such as nonsemantic data, semantic data, and security knowledge data. Cloud computing service providers typically have advanced big data analytics models, with advanced detection and prediction algorithms and better state-of-the-art cyber security technologies and better protocols because they specialize in data and networks.