ABSTRACT
Ever-accelerating climate change is causing dramatic increases in scale and frequency of wildfires in many parts of the world, more notably in the western United States over the last few decades. Due to such increase in severity, scientists are working to better predict and model wildfire behaviors using both traditional and newer methods. However, modeling wildfire involves a large amount of data consisting of multiple types of data over multiple decades. For example, types of data often used for wildfire prediction include measured features such as temperature, precipitation and vegetation, along with unstructured data such as satellite imagery and weather mapping. Given the size of the data and complexity of the task, building models for weather or wildfire patterns traditionally required use of super computers that are not generally accessible to the public, making the research out of reach for individuals or smaller organizations.
This work introduces a scalable infrastructure for data processing, storage and machine learning that is specialized for wildfire prediction. The infrastructure is built on readily-available tools on a public cloud and is capable of ingesting multi-terabytes of data to be used to train various types of modern machine learning models, such as regularizing gradient boosting and neural network. Given such infrastructure, this paper presents the training pipeline by building models that can estimate the risk of wildfire in California while considering effects of climate change. The experiments show that our infrastructure is able to train models using large ( https://www.w3.org/1998/Math/MathML" display="inline"> T ˜ https://www.w3.org/1999/xlink" xlink:href="https://s3-euw1-ap-pe-df-pch-content-public-p.s3.eu-west-1.amazonaws.com/9781003496724/6bf04a0e-b48b-4961-8a70-f426b1bc2da7/content/inline-math1_1.tif"/> Bs) of data, automatically improve the model performance from 0.75 to 0.80 AUC, all the while showing storage saving of at most 59% without using expensive supercomputers, suggesting that it is suitable as a wildfire predictor that can help mitigate the damage of wildfires in high risk areas and ultimately save human lives at a cost, resource and time optimal way.
