ABSTRACT

With the continuing collection of so much data, attention has grown in using this data to discover new information. The subject of data mining has rapidly evolved to meet this need; it is the process of uncovering relationships between elements in large datasets. Data mining techniques take the input data and output a representation that expresses information contained in the data or knowledge about the data. Data mining plays an important role in many diverse disciplines, including retail business, climate science, bioinformatics, telecommunications, and computer security. Although data mining is usually considered to be part of computer science, it employs a wide range of mathematical concepts and techniques. This chapter discusses the important tasks in data mining, the key mathematical ideas used, and some of the important algorithms used in data mining. Clustering is a powerful tool for automated analysis of data. Clustering is ubiquitous, with applications in the natural sciences, psychology, medicine, engineering, economics, marketing, and other fields.