ABSTRACT

The availability of distributed computing and storage systems has made Big Data processing a reality. The term Big Data refers to data which cannot be processed by a traditional system, owing to the limited amount of resources available from traditional systems. Of the many problems encountered and solved by Big Data systems, a set of problems classified under streaming data has brought unparalleled challenges and opportunities for industries and research institutes. “Streaming data” (or data pipelines) as we use the term refers to a long sequence of unbounded data that arrives continuously at varying rates for a very long duration. Although stream processing had its origins in the 1970s, most of the work in this field has been done in the last decade. Applications such as IoT, WSN, RAN, and GSM systems (3G and 4G LTE technologies) and others have allowed use of stream processing for Big Data environments, owing to the velocity and variety of data processing possible, as well as the quality of service (QoS) demanded by such applications. Moreover, these data pipelines also generate massive amounts of data over periodic intervals, making them candidates for the Big Data domain.