ABSTRACT

Data science is an interdisciplinary field that deals with a methodical approach to process large volumes of data both structured and unstructured in nature. The very objective is to analyze the data to uncover hidden patterns and extract actionable insights from the data for better managerial decision-making in an organization. Data science has been used in diverse areas such as business and finance, marketing, risk management, operations and planning, disease diagnosis and health care, agriculture, fraud detection, crime investigation, image and speech recognition, gaming, virtual reality, weather and environmental studies, space and defense applications to name a few.

Data science is not an entirely new discipline; rather, it has evolved from the existing fields such as data mining and knowledge discovery, business intelligence, data analytics, machine learning, computer science, software engineering, mathematics and statistics, among others. It is an umbrella field to many such fields which make data processing more systematic than ever before and very useful for organizational decision-making. Data science has a lot of potentials to solve complex organizational problems effectively. With the growth of social media, Internet of Things, ubiquitous computing, connectivity, ambient intelligence and above all digital economy, the field of big data has emerged as an opportunity as well as a challenge for many organizations. While big data stores a lot of business opportunities, how to make it useful for the organization is rather challenging. In this context, embracing data science becomes more pertinent for the organization. With the advent of big data, the importance and popularity of data science is accelerating.

This chapter will provide a compressive introduction to data science and big data analytics. It will elaborate on the data analytics life cycle. The chapter will delve into the theories and methods such as regression, classification, clustering and association rules used in data science. It will also introduce the relevant technologies such as MapReduce, NoSQL and popular tools such as Hadoop ecosystem. Finally, this chapter will conclude with research challenges in the field of data science and big data analytics.