Soil mapping data for spatial modeling of soil organic carbon with meaningless predictors
Description
This dataset contains data for the training of machine learning regression models from 668 hypothetical case studies in 334 study areas across Europe, a R script for data analysis and the random forest models that reproduce the spurious correlation between the spatial distribution of soil organic carbon (SOC) and meaningless predictors.
The 334 study areas are squared with 200 × 200 km and contain data from SOC from SoilGrids 2.0 (Poggio et al., 2021; https://doi.org/10.5194/soil-7-217-2021) as model outcome. 250 tiles are randomly distributed and 84 tiles are distributed regularly to account for bias towards areas covered by multiple randomly distributed tiles. The models are trained with 500 random samples and validated with another 1000 randomly selected samples from portrait images of researchers as independent covariates and SOC as outcome. The portrait images were reduced to greyscale with principal component analysis (PCA) and sRGB to linear luminance, which results in two series of 334 hypothetical case studies resulting in 668 hypothetical case studies in total..
The original portrait images are not included to protect personal rights and copyright. We thank Alexandre M. J.-C. Wadoux (https://doi.org/10.1111/ejss.12909) for providing the portrait images.
Files
Files
(5.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:2a717c6a4c09ee3256d818dbb68e83a6
|
13.1 kB | Download |
|
md5:36bc382eacdf6f42228c914c3ecd4c3d
|
4.5 GB | Download |
|
md5:f94024c539c929a9c02f4637394805de
|
159.7 kB | Download |
|
md5:8a58d9f37207a6da8efc08bfc443121e
|
192.5 kB | Download |
|
md5:daf6ddaf59ba2189c615b8d5af67b554
|
371.6 MB | Download |
|
md5:dd88cfa1953b953d313b712e8b89add9
|
124.8 MB | Download |
Additional details
Related works
- Is supplement to
- Event: https://conference.ufz.de/frontend/index.php?page_id=3985&v=List&do=15&day=all&ses=1126#anker_session_1126 (URL)
- References
- Text: 10.5194/soil-7-217-2021 (DOI)
Data quality
- Accuracy
-
Model accuracy ranges from 0.16 to 0.91 (concordance correlation coefficient), ‑2.33 to 3.95 g kg-1 SOC (ME) and 5.4 to 41.2 g kg-1 SOC (RMSE), while SOC in the case studies had an arithmetic mean of 100.4 g kg‑1.
- Completeness
-
not relevant
- Conformity
-
not relevant
- Consistency
-
not relevant
- Credibility
-
The SOC data is modelled data and was obtained from SoilGrids 2.0 (Poggio et al., 2021; https://doi.org/10.5194/soil-7-217-2021) and has an RMSE of 40 g kg-1.
- Processability
-
not relevant
- Relevance
-
not relevant
- Timeliness
-
not relevant
- Understandability
-
The data is understandable to data scientists in general and soil scientists with a focus on pedometrics in specific.