Soil spectral libraries have been an effective way to organize soil data in a standard structure to feed predictive models (Viscarra-Rossel et al., 2016). Since soils can be extremely different in composition and properties, developing large spectral libraries is a way to increase the sample size. The development of large soil spectral libraries usually implies combining data collected for different purposes under different standards and methodologies, which often leads databases to suffer from disparate and inconsistent soil data. Even though spectral data provisioning has been facilitating the application for the rapid assessment of several soil properties. Among soil properties, soil organic carbon (SOC), an important indicator of soil health, has received much attention. Despite the well-known potential of visible near-infrared (Vis–NIR) to predict SOC, little attention has been given to the reliability and effectiveness of the analytical methods used as reference. The global spectral library (Viscarra-Rossel et al., 2016), for example, has 17,931 SOC data, of which only 9,757 have the analytical method described. Among the methods listed are the wet combustion whose quantification may be either by titration (WCt) or colorimetry (WCc), dry combustion (DC) and other four methods. Although all of them are ways to measure carbon in soil samples, which aim to represent the same concept of the soil property, each one is a singular procedure operationally defined (Batjes et al., 2020), and even being standardizing, they are not compatible. Only the harmonization processes can make them usable at some higher level of aggregation or generalization, bringing together types and sources of data in such a way that they can be made comparable. That is the reason for the scientific interest in harmonization approaches as proposed in the “Implementation Plan for Pillar Five of the Global Soil Partnership” (Baritz et al., 2014). Despite ongoing efforts by the FAO through the Global Soil Laboratory Network (Hartmann, & Suvannang, 2019), there is a lack of standards and protocols to ensure compatible measurements across laboratories. Consequently, most spectral models are being calibrated and compared without distinction between analytical methods. Sometimes, even both methods have been used simultaneously in order to increase the sample size. However, some combinations of preprocessing and models may be more sensitive to laboratory (measurement) error than others and it is not clear which harmonization procedures may reduce this impact. Our hypothesis is that the predictive performance of Vis-NIR spectral modeling depends on the analytical method that is employed and its compatibility to produce the SOC reliable predictions. To test this hypothesis, we set up three experiments to be applied to the spectral library in southern Brazilian (Moura-Bueno et al., 2020).
Evaluate if the analytical method affects the SOC prediction by Vis-NIR. The leave-one-out cross-validation performance of three predictive models (Random Forest, Cubist, and Partial Least Square Regression) calibrated using SOC data from 395 soil samples was analyzed by three analytical methods (DC, WCt, and WCc) and three Vis-NIR spectra preprocessing techniques (smoothing, continuum removal, and Savitzky-Golay first derivative).
Evaluate whether a standard analytical method is more important than a large sample size in SOC predictions by Vis-NIR to make the most accurate SOC predictions. The spectral models from study 1 will be used, according to the individual reference analytical method (DC, WCc, and WCt). Additional SOC data from the spectral library will be iteratively added, accounting for 10 to 500% of the reference method. The additional data are a mix of measurements derived from DC, WCc, and WCt. To select these samples a bootstrap resampling technique will be used. This technique was chosen because wet determination errors are not homoscedastic. Thus, resampling will allow n possibilities of combinations, ensuring that the prediction result is less affected by extreme values in the data set.
Evaluate whether data harmonization can overcome the effects of the analytical methods and improve SOC prediction by Vis-NIR. The spectral models from study 1 will be used. The samples from the spectral library will be harmonized by pedotransfer functions. Additional SOC data from the spectral library (harmonized) will be added iteratively accounting for 10 to 500% of the reference method. The models from study 1, study 2, and study 3 will be compared by four metrics to assess performance: coefficient of determination (R2), root mean square error (RMSE), mean error (ME), and the ratio of performance to interquartile range (RPIQ).
We expect that the prediction performance of models varies depending on the SOC analytical method employed and its effects may be overcome by data harmonization. These results will be useful either to guide the analytical method selection for new projects or manage databases that are already available. This will be an important step towards ensuring the interoperability of spectral libraries as well as other databases. When making the SOC data compatible we hope to improve and ensure confidence in SOC predictions. Moreover, we hope that these concepts will be more critically discussed by the soil science community and included in a spectroscopy modeling protocol to prevent the prediction of poor-quality data.
Batjes, N.H., Ribeiro, E., Oostrum, A.V., 2020. Standardised soil profile data to support global mapping and modelling (WoSIS snapshot 2019). Earth System Science Data 12, 299–320. https://doi.org/10.5194/essd-12-299-2020
Baritz, R., Erdogan, H., Fujii, K., Takata, Y., Nocita, M., Bussian, B., … & Vargas, R. (2014). Plan of Action for Pillar Five of the Global Soil Partnership: Harmonization of methods, measurements and indicators for the sustainable management and protection of soil resources. Global Soil Partnership (GSP), Food and Agricultural Organisation of the United Nations. http://www.fao.org/3/a-az922e.pdf
Hartmann, C. & Suvannang, N. 2019. Global Soil Laboratory Assessment, 2018 online survey. Rome, FAO. http://www.fao.org/3/ca7091en/CA7091EN.pdf
Moura-Bueno, J.M., Dalmolin, R.S.D., Horst, T.Z., ten Caten, A., Vasques, G.M., Dotto, A.C., Grunwald, S., 2020. When does stratification of a subtropical soil spectral library improve predictions of soil organic carbon content? Science of The Total Environment 139895. https://doi.org/10.1016/j.scitotenv.2020.139895
Viscarra Rossel, R.A., Behrens, T., Ben-Dor, E., Brown, D.J., Demattê, J.A.M., Shepherd, K.D., …& Ji, W., 2016. A global spectral library to characterize the world’s soil. Earth-Science Reviews 155, 198–230. https://doi.org/10.1016/j.earscirev.2016.01.012