ABSTRACT

Data mining ¤ourishes because the information in¤ux in ubiquitous applications calls for data management, pattern recognition and classiƒcation, and knowledge discovery. Cyberinfrastructures generate peta-scale data sets for daily monitoring and pattern proƒling in cybersecurity models. To facilitate the application of datamining techniques in cybersecurity protection systems, we comprehensively study the classic data-mining and machine-learning paradigms. In this chapter, we introduce the fundamental concepts of machine learning in Section 2.1. We categorize classic machine-learning methods into supervised learning and unsupervised learning, and present the respective methodologies, which will be used in cybersecurity techniques. In Section 2.2, we highlight a variety of techniques, such as resampling, feature selection, cost-e®ective learning, and performance evaluation metrics, that can be used to improve and evaluate the quality of machine-learning methods in mining cyberinfrastructure data. Since malicious behaviors occur either rarely or infrequently among cyberinfrastructures, classic machine-learning techniques must adopt machine-learning techniques to perform unbalanced learning accurately. In Section 2.3, we address several challenges that arise when we apply the

24  ◾ 

classic data-mining and machine-learning methods to discovering cyberinfrastructures. Finally, we summarize the emerging research directions in machine learning for cybersecurity in Section 2.4.