ABSTRACT

A digital soil mapping methodology is presented for modeling soil’s superficial (0–5 cm) pH for the Mexican territory at a contextual resolution (1 km2); predictions were made across ~8500 legacy soil profiles using three predictive algorithms: random forest, kknn and cubist. For each algorithm, a 100 times loop was carried out where every 10 repetitions we tested if residuals showed a stable trend with a slope near to zero to stop the iteration, additionally RMSE and R (Pearson) were estimated as validation metrics for each realization. A novel pixel wise ensemble is also presented where the median value of the set of predictions with less standard error [sd] was carried to the final map. Additional maps for “best algorithm” and its sd, as proxy of uncertainty, were also created. Results show that random forest yields overall better validation metrics (r = 0.7 & RMSE = 1.1) dominating the prediction in more than 49% of the territory.