Forecasting Air Quality in India through an Ensemble Clustering Technique

doi:10.1201/9781003049548-6

ABSTRACT

Artificial intelligence decision making has evolved over the past few decades and gained further momentum in recent years. Machine learning has revolutionized techniques in many areas such as decision making, analyzing data trends, and forecasting. Applications in the digital era generate voluminous data that create significant challenges in analyzing it and deriving knowledge from it. This chapter focuses on the analysis of air quality data, based on air pollutant concentration information, collected for 12 states in India. Some of the major pollutants for the study include CO, PM_2.5, SO₂, and Ozone. It is a computationally complex task to monitor air content at different times across different cities. To predict the air quality and its trend with respect to a particular location is the authors’ major focus. Many supervised techniques are available which can find the relationship between pollutants and air quality index. These models, however, fail to find an association between the city and its air quality. Clustering techniques can derive interesting patterns from the data to find the relationship between different entities. Ensemble techniques are quite popular in deriving combined decisions from multiple learners. Very few attempts have been made in this direction. A modified consensus function is used to effectively aggregate the results from multiple base clustering methods. Empirical results show the improved performance of the ensemble clustering algorithm. Performance measures of the Silhouette Coefficient, the Calinski-Harabasz Index, and the DB index have demonstrated better clustering results of approximately 0.75, 2369.59, and 0.56, respectively. Visualization of these results helps us to understand the effect of pollutants in various cities in India.