ABSTRACT

When accidental spills of the contaminant occur in natural rivers, it is necessary to identify the contaminant source to minimize the damage from the contamination to both aquatic life and human who depend on the river system. However, back-tracking the pollution source is an ill-posed problem due to the lack of the observed data and complexity of the mixing processes in the natural river. Therefore, we proposed new data-driven models to identify the pollution source location and spill mass discharge in the river network system. First, a large number of numerical simulations were implemented by the Transient Storage zone Model (TSM) to develop a chemical spill scenario data-base. The developed data-base contained many spill cases with various hydraulic conditions. Then, the Breakthrough-Curve (BTC) in the developed data-base was extracted as 11 features presenting the mixing characteristics of the contaminant in the stream to reduce the dimension of data-set. By using Recursive Feature Elimination (RFE) as a feature selection method, relevant BTC features are selected by each data-driven model and evaluated as relative feature importance. Finally, compared with Random Forest (RF), Support Vector Machine (SVM) with RFE, the RF shows outstanding performance in terms of prediction of both spill location and spill mass discharge. Furthermore, the result of RFE demonstrates that the relevant features are different according to data-driven models.