Big data mining: A classification perspective

doi:10.1201/9781315375083-102

ABSTRACT

Mining and discovering meaningful knowledge from big data for decision-making, prediction, and for other purposes is extremely challenging due to its characteristics. Knowledge Discovery (KD) is the process of discovering useful knowledge from a collection of data. Major KD application areas include marketing, manufacturing, fraud detection, telecommunication, education, medical, Internet agent and many other areas [6, 7]. Data mining is the core step of KD process where algorithms are applied to extract useful patterns from data. Tasks in data mining can be classified into

1 INTRODUCTION

With the fast development of Internet communication and collaboration, Internet of Things and Cloud Computing, large amounts of data have become increasingly available at significant volumes (petabytes or more). Such data comes from a wider variety of sources and formats including social networking interactions, web pages, click streams, online transaction, emails, videos, audios, images, posts, search queries, health records, science data, sensors, smart phones and their applications, and so on [1]. According to the 2014 IDC ‘Digital Universe Study’ [2], 130 exabytes (EB) of world’s data were created and stored in 2005. The amount grew to 4.4 zettabytes (ZB). It is doubling in size every two years and is projected to grow to 44 ZB in 2020 [2]. In 2012, IBM estimated that 2.5 quintillion bytes of data were created daily [3].