Data Analytics and Mining : Platforms for Real-Time Applications

doi:10.1201/9781003199403-5

ABSTRACT

The proliferation of the internet and advancement in technology have prompted the generation of humongous amounts of data at breakneck speeds from multitudinous sources such as IoT devices, clickstreams, social media, video surveillance systems, and various kinds of sensors. Such kind of data updates with high frequency and loses its value and relevance in a brief timeframe. Therefore, it becomes imperative to process and analyze such data on-the-fly enabling time-critical decision making in numerous applications. Real-time querying and big data streams are the new requirements. The earlier batch processing frameworks such as Hadoop endow valuable insight into what has happened in the past but aren’t able to deal with what is happening currently. Due to slower response time and high latency, it isn’t completely suitable to handle the dynamic real-time data; furthermore, the ability to make the right decisions and take proper actions at the opportune time can’t be achieved. Hence, we need additional tools to cope with these new demands. This chapter covers new technologies such as in-memory computing and stream processing, and a brief overview of real-time architectures is also presented. This chapter discusses the state-of-the-art real-time analytics platforms such as Apache Storm, Apache Spark, Apache Flink, and so on that can be applied to real-time applications, and finally this chapter concludes with the comparison of these platforms based on various essential features such as latency, throughput, delivery guarantees, and so on that can help in choosing a particular platform for certain applications.