ABSTRACT

Data mining is a process where critical business data are analyzed to gain new insights about customers, businesses, and markets. This new knowledge gained can be used to improve customer relationships and to produce better-quality products and services, which can result in higher revenues and profits. These data are generally in a relational or multidimensional format and stored in companies’ central data warehouses. But with the evolution of the enterprise, a diverse set of data structures have come to be used: graph data, which could feed from social network sites; time series data; longitudinal data; semistructured data, such as XML; unstructured data; and big data. There is a need for different data repositories to store all these diverse data. Analytics is carried out on the data in the repositories. Access to these data repositories is strictly controlled by access control rights. Strict security measures are employed to secure the data as they are very sensitive and contain customer-identifying information. In addition to all security measures, companies ensure that the data are anonymized before being used for analytics/mining. More often, companies share their data with specialized analytics firms, and the data need to be protected before sharing.