# Preface

Predictive Soil Mapping (PSM) is based on applying statistical and/or machine learning techniques to fit models for the purpose of producing spatial and/or spatiotemporal predictions of soil variables, i.e. maps of soil properties and classes at different resolutions. It is a multidisciplinary field combining statistics, data science, soil science, physical geography, remote sensing, geoinformation science and a number of other sciences (Scull et al. 2003; McBratney, Mendonça Santos, and Minasny 2003; Henderson et al. 2004; Boettinger et al. 2010; Zhu et al. 2015). Predictive Soil Mapping with R is about understanding the main concepts behind soil mapping, mastering R packages that can be used to produce high quality soil maps, and about optimizing all processes involved so that production costs can also be reduced.

The main differences between predictive vs traditional expert-based soil mapping are that: (a) the production of maps is based on using state-of-the-art statistical methods to ensure objectivity of maps (including objective uncertainty assessment vs expert judgment), and (b) PSM is driven by automation of the processes so that overall soil data production costs can be reduced and updates of maps implemented without requirements for large investments. R, in that sense, is a logical platform to develop PSM workflows and applications, especially thanks to the vibrant and productive R spatial interest group activities and also thanks to the increasingly professional soil data packages such as, for example: soiltexture, aqp, soilprofile, soilDB and similar.

The book is divided into sections covering theoretical concepts, preparation of covariates, model selection and evaluation, prediction and final practical tips for operational PSM. Most of the chapters contain R code examples that try to illustrate the main processing steps and give practical instructions to developers and applied users.

## Connected publications

Most of methods described in this book are based on the following publications:

Some other relevant publications / books on the subject of Predictive Soil Mapping and Data Science in general include:

Readers are also encouraged to obtain and study the following R books before following some of the more complex exercises in this book:

For the most recent developments in the R-spatial community refer to https://r-spatial.github.io, the R-sig-geo mailing list and/or https://opengeohub.org.

## Contributions

This book is designed to be constantly updated and contributions are always welcome (through pull requests, but also through adding new chapters) provided that some minimum requirements are met. To contribute a new chapter please contact the editors first. Some minimum requirements to contribute a chapter are:

1. The data needs to be available for the majority of tutorials presented in a chapter. It is best if this is via some R package or web-source.
2. A chapter should ideally focus on implementing some computing in R (it should be written as an R tutorial).
3. All examples should be computationally efficient requiring not more than 30 secs of computing time per process on a single core system.
4. The theoretical basis for methods and interpretation of results should be based on peer-review publications. This book is not intended to report on primary research / experimental results, but only to supplement existing research publications.
5. A chapter should consist of at least 1500 words and at most 3500 words.
6. The topic of the chapter must be closely connected to the theme of soil mapping, soil geographical databases, methods for processing spatial soil data and similar.

In principle, all submitted chapters should follow closely also the five pillars of Wikipedia, especially: Verifiability, Reproducibility, No original research, Neutral point of view, Good faith, No conflict of interest, and No personal attacks.

## Reproducibility

To reproduce the book, you need a recent version of R, and RStudio and up-to-date packages, which can be installed with the following command (which requires devtools):

devtools::install_github("Envirometrix/PSMpkg")

To build the book locally, clone or download the PredictiveSoilMapping repo, load R in root directory (e.g. by opening PredictiveSoilMapping.Rproj in RStudio) and run the following lines:

bookdown::render_book("index.Rmd") # to build the book
browseURL("docs/index.html") # to view it

## Acknowledgements

The authors are grateful for numerous contributions from colleagues around the world, especially for contributions by current and former ISRIC — World Soil Information colleagues and guest researchers: Gerard Heuvelink, Johan Leenaars, Jorge Mendes de Jesus, Wei Shangguan, David G. Rossiter, and many others. The authors are also grateful to Dutch and European citizens for financing ISRIC and Wageningen University, where work on this book was initially started. The authors acknowledge support received from the AfSIS project, which was funded by the Bill and Melinda Gates Foundation (BMGF) and the Alliance for a Green Revolution in Africa (AGRA). Many soil data processing examples in the book are based on R code developed by Dylan Beuadette, Pierre Roudier, Alessandro Samuel Rosa, Marcos E. Angelini, Guillermo Federico Olmedo, Julian Moeys, Brendan Malone, and many other developers. The authors are also grateful to comments and suggestions for improvements to the methods presented in the book by Travis Nauman, Amanda Ramcharan, David G. Rossiter and Julian Moeys.

LandGIS and SoilGrids are based on using numerous soil profile data sets kindly made available by various national and international agencies: the USA National Cooperative Soil Survey Soil Characterization database (http://ncsslabdatamart.sc.egov.usda.gov) and profiles from the USA National Soil Information System, Land Use/Land Cover Area Frame Survey (LUCAS) Topsoil Survey database (Tóth, Jones, and Montanarella 2013), Repositório Brasileiro Livre para Dados Abertos do Solo (FEBR), Sistema de Información de Suelos de Latinoamérica y el Caribe (SISLAC), Africa Soil Profiles database (Leenaars 2014), Australian National Soil Information by CSIRO Land and Water (Karssies 2011; Searle 2014), Mexican National soil profile database (Instituto Nacional de Estadística y Geografía (INEGI) 2000) provided by the Mexican Instituto Nacional de Estadística y Geografía / CONABIO, Brazilian national soil profile database (Cooper et al. 2005) provided by the University of São Paulo, Chinese National Soil Profile database (Shangguan et al. 2013) provided by the Institute of Soil Science, Chinese Academy of Sciences, soil profile archive from the Canadian Soil Information System (MacDonald and Valentine 1992) and Forest Ecosystem Carbon Database (FECD), ISRIC-WISE (Batjes 2009), The Northern Circumpolar Soil Carbon Database (Hugelius et al. 2013), eSOTER profiles (Van Engelen and Dijkshoorn 2012), SPADE (Hollis et al. 2006), Unified State Register of soil resources RUSSIA (Version 1.0. Moscow — 2014), National Database of Iran provided by the Tehran University, points from the Dutch Soil Information System (BIS) prepared by Wageningen Environmental Research, and others. We are also grateful to USA’s NASA, USGS and USDA agencies, European Space Agency Copernicus projects, JAXA (Japan Aerospace Exploration Agency) for distributing vast amounts of remote sensing data (especially MODIS, Landsat, Copernicus land products and elevation data), and to the Open Source software developers of the packages rgdal, sp, raster, caret, mlr, ranger, SuperLearner, h2o and similar, and without which predictive soil mapping would most likely not be possible.

This book has been inspired by the Geocomputation with R book, an Open Access book edited by Robin Lovelace, Jakub Nowosad and Jannes Muenchow. Many thanks to Robin Lovelace for helping with rmarkdown and for giving some initial tips for compiling and organizing this book. The authors are also grateful to the numerous software/package developers, especially Edzer Pebesma, Roger Bivand, Robert Hijmans, Markus Neteler, Tim Appelhans, and Hadley Wickham, whose contributions have enabled a generation of researchers and applied projects.

We are especially grateful to Jakub Nowosad for helping with preparing this publication for press and with setting up all code so that it passes automatic checks.

OpenGeoHub is a not-for-profit research foundation with headquarters in Wageningen, the Netherlands (Stichting OpenGeoHub, KvK 71844570). The main goal of the OpenGeoHub is to promote publishing and sharing of Open Geographical and Geoscientific Data and using and developing of Open Source Software. We believe that the key measure of quality of research in all sciences (and especially in geographical information sciences) is in transparency and reproducibility of the computer code used to generate results. Transparency and reproducibility increase trust in information so that it is eventually also the fastest path to optimal decision making.

Every effort has been made to trace copyright holders of the materials used in this publication. Should we, despite all our efforts, have overlooked contributors please contact the author and we shall correct this unintentional omission without any delay and will acknowledge any overlooked contributions and contributors in future updates.

Data availability and Code license: All data used in this book is either available through R packages or is available via the github repository. If not mentioned otherwise, all code presented is available under the GNU General Public License v2.0.

### References

Scull, P., J. Franklin, O. A. Chadwick, and D. McArthur. 2003. “Predictive Soil Mapping: A Review.” Progress in Physical Geography 27 (2):171–97.

McBratney, A.B, M.L Mendonça Santos, and B Minasny. 2003. “On Digital Soil Mapping.” Geoderma 117 (1):3–52. https://doi.org/10.1016/S0016-7061(03)00223-4.

Henderson, B. L., E. N. Bui, C. J. Moran, and D. A. P. Simon. 2004. “Australia-wide predictions of soil properties using decision trees.” Geoderma 124 (3-4):383–98.

Boettinger, J. L., D. W. Howell, A. C. Moore, A. E. Hartemink, and S. Kienast-Brown, eds. 2010. Digital Soil Mapping: Bridging Research, Environmental Application, and Operation. Vol. 2. Progress in Soil Science. Springer.

Zhu, A.X., J. Liu, F. Du, S.J. Zhang, C.Z. Qin, J. Burt, T. Behrens, and T. Scholten. 2015. “Predictive Soil Mapping with Limited Sample Data.” European Journal of Soil Science 66 (3):535–47. https://doi.org/10.1111/ejss.12244.

Tóth, G., A. Jones, and L. Montanarella, eds. 2013. LUCAS Topsoil Survey. Methodology, Data and Results. JRC Technical Reports Eur 26102. Luxembourg: Publications Office of the European Union.

Leenaars, Johan G.B. 2014. Africa Soil Profiles Database, Version 1.2. A Compilation of Geo-Referenced and Standardized Legacy Soil Profile Data for Sub Saharan Africa (with Dataset). Wageningen, the Netherlands: Africa Soil Information Service (AfSIS) project; ISRIC — World Soil Information.

Karssies, Linda. 2011. CSIRO National Soil Archive and the National Soil Database (Natsoil). Data Collection v1. Canberra: CSIRO.

Searle, R. 2014. “The Australian Site Data Collation to Support the Globalsoilmap.” GlobalSoilMap: Basis of the Global Spatial Soil Information System. CRC Press, 127.

Instituto Nacional de Estadística y Geografía (INEGI). 2000. Conjunto de Datos de Perfiles de Suelos, Escala 1: 250 000 Serie Ii. (Continuo Nacional). Aguascalientes, Ags. México: INEGI.

Cooper, Miguel, Lúcia Maria Silveira Mendes, Wellinton Luiz Costa Silva, and Gerd Sparovek. 2005. “A National Soil Profile Database for Brazil Available to International Scientists.” Soil Science Society of America Journal 69 (3). Soil Science Society:649–52.

Shangguan, Wei, Yongjiu Dai, Baoyuan Liu, Axing Zhu, Qingyun Duan, Lizong Wu, Duoying Ji, et al. 2013. “A China Data Set of Soil Properties for Land Surface Modeling.” Journal of Advances in Modeling Earth Systems 5 (2):212–24. https://doi.org/10.1002/jame.20026.

MacDonald, K. B., and K. W. G. Valentine. 1992. CanSIS/Nsdb. A General Description. Ottawa: Centre for Land; Biological Resources Research, Research Branch, Agriculture Canada.

Batjes, N. H. 2009. “Harmonized soil profile data for applications at global and continental scales: Updates to the WISE database.” Soil Use and Management 25 (2):124–27. https://doi.org/10.1111/j.1475-2743.2009.00202.x.

Hugelius, G., C. Tarnocai, G. Broll, J. G. Canadell, P. Kuhry, and D. K. Swanson. 2013. “The Northern Circumpolar Soil Carbon Database: Spatially Distributed Datasets of Soil Coverage and Soil Carbon Storage in the Northern Permafrost Regions.” Earth System Science Data 5 (1):3–13. https://doi.org/10.5194/essd-5-3-2013.

Van Engelen, V.W.P., and J.A. Dijkshoorn, eds. 2012. Global and National Soils and Terrain Digital Databases (SOTER), Procedures Manual, version 2.0. ISRIC Report 2012/04. Wageningen, the Netherlands: ISRIC - World Soil Information.

Hollis, J.M., R.J.A. Jones, C.J. Marshall, A. Holden, J.R. Van de Veen, and L. Montanarella. 2006. SPADE-2: The soil profile analytical database for Europe, version 1.0. Luxembourg: Office for official publications of the European Communities.