SoilGrids250m: Global gridded soil information based on machine learning
Date
2017-02-16
Journal Title
Journal ISSN
Volume Title
Publisher
PLOS (Public Library of Science)
Abstract
This paper describes the technical development and accuracy assessment of the most
recent and improved version of the SoilGrids system at 250m resolution (June 2016
update). SoilGrids provides global predictions for standard numeric soil properties
(organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions
and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in
addition to predictions of depth to bedrock and distribution of soil classes based on the
World Reference Base (WRB) and USDA classification systems (ca. 280 raster layers in
total). Predictions were based on ca. 150,000 soil profiles used for training and a stack of
158 remote sensing-based soil covariates (primarily derived from MODIS land products,
SRTM DEM derivatives, climatic images and global landform and lithology maps), which
were used to fit an ensemble of machine learning methodsÐrandom forest and gradient
boosting and/or multinomial logistic regressionÐas implemented in the R packages
ranger, xgboost, nnet and caret. The results of 10±fold cross-validation show that
the ensemble models explain between 56% (coarse fragments) and 83% (pH) of variation
with an overall average of 61%. Improvements in the relative accuracy considering the
amount of variation explained, in comparison to the previous version of SoilGrids at 1 km
spatial resolution, range from 60 to 230%. Improvements can be attributed to: (1) the use
of machine learning instead of linear regression, (2) to considerable investments in preparing
finer resolution covariate layers and (3) to insertion of additional soil profiles. Further
development of SoilGrids could include refinement of methods to incorporate input uncertainties
and derivation of posterior probability distributions (per pixel), and further automation
of spatial modeling so that soil maps can be generated for potentially hundreds of soil
variables. Another area of future research is the development of methods for multiscale
merging of SoilGrids predictions with local and/or national gridded soil products (e.g. up to 50 m spatial resolution) so that increasingly more accurate, complete and consistent
global soil information can be produced. SoilGrids are available under the Open Data
Base License.
Description
Publisher's PDF
Keywords
Citation
Hengl T, Mendes de Jesus J, Heuvelink GBM, Ruiperez Gonzalez M, Kilibarda M, Blagotić A, et al. (2017) SoilGrids250m: Global gridded soil information based on machine learning. PLoS ONE 12(2): e0169748. doi:10.1371/journal. pone.0169748