ABSTRACT

Streaming analytics is one of the applications of big data. This technology has enhanced the digital world in many folds ranging from simple customer analytics in a spreadsheet to mobile app development for travel planning and route guidance. This work presents various traffic forecasting methods that involve real-time data analytics. The primary objective of this chapter is to present the analytics behind streaming real-time traffic data from Twitter. This study aims at building an application that would accept the city name from the user, then generate all the traffic-related tweets in the city within a period of seven days from the date of the search. This would then alert the user to avoid those particular routes where there is a blockage, accident, or any other obstacle. The tweets are fetched from Twitter API using TweePy. These tweets later need to be classified as “traffic” or “nontraffic” based on the model. The tweets also need to be preprocessed in order to improve the efficiency of the model. The tweets are also lemmatized in order to improve the scope of search and accuracy. The most essential part is the building of model which is preceded by vectorization. This involves the representation of tweets as an array of numbers for the machine to understand. The model is trained with a preclassified dataset and then is used to classify the tweets which were fetched earlier. This chapter presents the data processing pipeline 278for streaming analytics using Twitter API. Toward the end, the chapter summarizes the results and directions for future research.