Predicting the spatial distribution of soil organic matter using the model consisting of the Boruta algorithm and the optimized GA combined with the geostatistical method
GAO Peng-Li1(), REN Da-Lu2, LI Chao-Hui3, FENG Zhi-Qiang1,4(), MIAO Hong-Yun2, QIAO Lin2, WANG Jian-Wu4, YANG Yong-Liang4, ZHANG Li-Ming4, LI Guang-Hui5
1. Shanxi Province Key Laboratory of Metallogeny and Assessment of Strategic Mineral Resources, Department of Earth Science and Engineering, Taiyuan University of Technology, Taiyuan 030024, China 2. No. 213 Geology Team of Shanxi Provincial Geological Prospecting Bureau, Linfen 041000, China 3. The Third Geolodical Exploration Institute, General Administration of Metallurgical Geology of China, Taiyuan 030006, China 4. Shanxi Institute of Geological Survey Co., Ltd., Taiyuan 030006, China 5. College of Physics and Electronic Engineering, Shanxi University, Taiyuan 030006, China
建立土壤有机质(SOM)空间预测模型不仅可以准确预测SOM含量的空间分布,而且对科学化土壤管理和完善生态系统服务具有重要意义。本文以山西省临汾市永和县土壤为研究对象,从数字高程模型(DEM)和植被遥感数据中提取出地形因子和植被指数,结合土壤本身属性为变量因子,采取Boruta算法从变量因子中筛选出与SOM相关性强的特征变量为辅助变量作为模型输入,实测SOM值作为模型输出,选择普通克里格方法(OK)、反向传播神经网络(BPNN)、遗传算法优化的BP神经网络(GA-BPNN)和GA优化BP神经网络结合地统计方法(GA-BPNN-OK)对训练集样本 SOM 含量进行预测,并利用验证集样本对比分析预测精度。研究结果显示:Boruta算法优选出特征变量并且对其进行了重要性排列,依次为:全氮>地形湿度指数(TWI)>高程>坡度>归一化植被指数(NDVI)>增强型植被指数(EVI);4种方法对SOM的预测结果虽然局部会有差异,但整体的空间分布基本一致,在研究区内呈现出西部和西南部地区低、东部和东南部地区高的空间分布趋势,与其他3种模型相比,GA-BPNN-OK模型预测的SOM分布图对低值区和高值区的划分更加明显、细致;预测精度指标对比得出,GA-BPNN-OK法的均方根误差(RMSE为0.059)、平均绝对误差(MAE为0.240)、平均相对误差(MRE为0.165)最小,且拟合系数(R2为0.78)最高。同时为了验证采用Boruta算法对模型精度有所提高,将全变量与特征筛选之后的变量作为GA-BPNN法的模型输入,对预测结果进行对比,结果表明采取Boruta算法后模型误差减小。因此采取Boruta算法筛选出特征变量作为辅助变量,GA-BPNN-OK法对于SOM含量空间分布的精度最高,两者结合为最优预测模型。
Establishing a spatial prediction model for soil organic matter (SOM) can accurately predict the spatial distribution of SOM content, playing a significant role in scientific soil management and ecosystem service enhancement. Focusing on the soils in Yonghe County, Linfen City, Shanxi Province, this study extracted topographic factors and vegetation indices from the digital elevation model (DEM) and vegetation remote sensing data. With soil attributes as variable factors, this study, using the Boruta algorithm, selected the characteristic variablescorrelating strongly with SOM from variable factors as auxiliary variables. These auxiliary variables were used as model inputand the measured SOM values as model output.The SOM content in samples in the training set was predicted usingthe ordinary Kriging (OK)method, the back propagation neural network (BPNN), the genetic algorithm-optimized BPNN (GA-BPNN), and the improved BPNN combined with the geostatistical method (the GA-BPNN-OK method) separately. The prediction accuracy was comparatively analyzed based on samples in the validation set. The results show that: (1)The Boruta algorithm ranked the selected characteristic variables in order of importance, obtaining the sequence of total nitrogen >topographic wetness index (TWI) > elevation > slope > normalized difference vegetation index (NDVI) > enhanced vegetation index (EVI); (2)Despite local differences,the SOM prediction results obtained using the four methods exhibited roughly the same overall spatial distribution: low in the western and southwestern portions of the study areabut high in the eastern and southeastern portions;(3)Compared to the other three models, the GA-BPNN-OK model demonstrated more distinct low- and high-value areas in the predicted SOM distribution. (4) As revealed by the comparison of prediction accuracy indices, the GA-BPNN-OK method yielded a minimum root mean square error (RMSE) of 0.059, a minimum mean absolute error (MAE) of 0.240,a minimum mean relative error (MRE) of 0.165, and a maximum fitting coefficient (R2) of 0.78. To verify the effects of the Boruta algorithm in improving model accuracy, global variables, as well as the variables determined through characteristic selection, were used as the model inputof the GA-BPNN method. The comparison of the prediction results indicates that the Boruta algorithm reduced the model error. Therefore, the Boruta algorithm and the GA-BPNN-OK method constitute the optimal prediction model for the spatial distribution of SOM content.
高鹏利, 任大陆, 李朝辉, 冯志强, 苗洪运, 乔林, 王建武, 杨永亮, 张利明, 李光辉. 基于Boruta算法和GA优化混合地统计模型的土壤有机质空间分布预测[J]. 物探与化探, 2024, 48(3): 747-758.
GAO Peng-Li, REN Da-Lu, LI Chao-Hui, FENG Zhi-Qiang, MIAO Hong-Yun, QIAO Lin, WANG Jian-Wu, YANG Yong-Liang, ZHANG Li-Ming, LI Guang-Hui. Predicting the spatial distribution of soil organic matter using the model consisting of the Boruta algorithm and the optimized GA combined with the geostatistical method. Geophysical and Geochemical Exploration, 2024, 48(3): 747-758.
He S F, Zhou Q. Local wavelet packet decomposition of soil hyperspectral for SOM estimation[J]. Infrared Physics & Technology, 2022,125:104285.
[2]
Vahedi A A. Monitoring soil carbon pool in the Hyrcanian coastal plain forest of Iran:Artificial neural network application in comparison with developing traditional models[J]. Catena, 2017,152:182-189.
[3]
Megan B, Marc G. Emerging land use practices rapidly increase soil organic matter[J]. Nature Communications, 2015,6:6995.
Lian G, Guo X D, Fu B J, et al. Spatial variation of soil nutrients in Loess Plateau:A case study of Hengshan County,Shaanxi Province[J]. Acta Pedologica Sinica, 2008(4):577-584.
Zhang S M, Wang Z M, Zhang B, et al. Prediction of spatial distribution of soil nutrients using topographic and remote sensing data[J]. Transactions of the Chinese Society of Agricultural Engineering, 2010,(5):188-194.
Li Q Q, Wang C Q, Yue T X, et al. Prediction of spatial distribution of soil organic matter based on qualitative and quantitative auxiliary variables:A case study of Santai County,Sichuan Province[J]. Progress in Geography, 2014,(2):259-269.
[7]
Dai F Q, Zhou Q G. Spatial prediction of soil organic matter content integrating artificial neural network and ordinary kriging in Tibetan Plateau[J]. Ecological Indicators, 2014, 45(1):184-194.
[8]
Dharumarajan S, Hegde R. Spatial prediction of major soil properties using Random Forest techniques-A case study in semi-arid tropics of South India(Article)[J]. Geoderma Regional, 2017,10:154-162.
Han X X, Chen J, Wang H Y, et al. Spatial prediction of surface soil organic matter content based on stochastic forest model:A case study of Huixian City,Henan Province[J]. Soil Science, 2019, 51 (1):152-159.
Lu H L, Zhao M S, Liu B Y, et al. Prediction of spatial distribution of soil properties in Anhui Province based on Random forest model[J]. Soil Science, 2019, 51 (3):602-608.
[11]
Yu Q, Yao T C. Improving estimation of soil organic matter content by combining Landsat 8 OLI images and environmental data:A case study in the river valley of the southern Qinghai-Tibet Plateau[J]. Computers & Electronics in Agriculture, 2021,185:106144.
Zhou Y, Liu L Y, Lu Y L, et al. Digital mapping of regional soil organic matter with multi-source data from satellite and ground[J]. Journal of Remote Sensing, 2015,(6):998-1006.
[13]
Liu Q, He L. Digital mapping of soil organic carbon density using newly developed bare soil spectral indices and deep neural network[J]. Catena, 2022,219:106603.
[14]
Tajgardan T, Ayoubi S. Soil surface salinity prediction using ASTER data:Comparing statistical and geostatistical models[J]. Australian Journal of Basic and Applied Sciences, 2011, 4(3):457-467.
Jiang S P, Zhang H Z, Zhang X L, et al. Spatial distribution of soil organic matter in Hainan Island based on three spatial prediction models[J]. Acta Pedologica Sinica, 2018, 55 (4):1007-1017.
Shen Z Q, Shi J B, Wang K, et al. Application of integrated BP neural network to spatial variation of field soil[J]. Transactions of the Chinese Society of Agricultural Engineering, 2004, 20(3):35-39.
[17]
Vitharana U W A, Mishra U. National soil organic carbon estimates can improve global estimates[J]. Geoderma, 2019, 337(1):55-64.
[18]
George K J, Kumar S. Soil organic carbon prediction using visible-near infrared reflectance spectroscopy employing artificial neural network modelling[J]. Current Science, 2020, 119(2):377-381.
Wu J, Guo D Q, Li G, et al. Hyperspectral prediction of soil organic carbon content in Jiangxi Province based on CARS-BPNN[J]. Scientia Agricultura Sinica, 202, 55(19):3738-3750.
[20]
Odebiri O, Mutanga O. Deep learning-based national scale soil organic carbon mapping with Sentinel-3 data[J]. Geoderma, 2022,411.
Lai Y Q, Sun X L, Wang H L. Application of artificial neural network and its mixed model with geostatistics on soil organic carbon prediction mapping in small hilly area[J]. Chinese Journal of Soil Science, 2017, 51(6):1313-1322.
Zhang H S, Zhu G L, Wu J Y, et al. Spatial distribution simulation of soil organic matter based on BP neural network and Kriging:A case study of Hua'an County,Fujian Province[J]. Subtropical Agricultural Research, 2021, 17 (1):40-47.
[23]
Song Y Q, Sun N, Zhang L. Using multispectral variables to estimate heavy metals content in agricultural soils:A case of suburban area in Tianjin,China[J]. Geoderma Regional, 2022,29:e00540.
He H Y. Research on extraction method of vegetation index from MODIS data [C]// 2006 Remote Sensing Science and Technology Forum and China Association of Remote Sensing Applications Annual Meeting,2006.
[25]
Rigol S J P, Stuart N. ArcGeomorphometry:A toolbox for geomorphometric characterisation of DEMs in the ArcGIS environment[J]. Computers & Geosciences, 2015,85:155-163.
[26]
Alireza A, Fatemeh R. Modelling of piping collapses and gully headcut landforms:Evaluating topographic variables from different types of DEM[J]. Geoscience Frontiers, 2021,12:135-152.
[27]
Hamid G, Aliakbar M. Using the Boruta algorithm and deep learning models for mapping land susceptibility to atmospheric dust emissions in Iran[J]. Aeolian Research, 2021,50:100682.
[28]
Mahamed L G, Muhammad H K. Potential of Vis-NIR to measure heavy metals in different varieties of organic-fertilizers using Boruta and deep belief network[J]. Ecotoxicology and Environmental Safety, 2021,228:112996.
Lu H L, Zhao M S, Liu B Y, et al. Prediction mapping of soil pH value based on Boruta-Support vector regression in Anhui Province[J]. Geography and Geo-Information Science, 2019, 35 (5):66-72.
Jiang Y F, Sun K, Guo X. et al. Spatial distribution prediction of soil attributes based on environmental factors and proximity information[J]. Research of Environmental Science, 2017, 30 (7):1059-1068.
Zhang W T, Ji J Y, Li B B, et al. Study on prediction method of soil organic matter in different geomorphic regions of Loess Plateau[J]. Plant Nutrition and Fertilizer Journal, 2021, 27(4):583-594.
Wang Y X, Yang K, Gao B B, et al. Prediction of spatial distribution of soil organic matter based on two-point machine learning[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (12):65-73.
[34]
Li J H, Zhu D S. Comparative analysis of BPNN,SVR,LSTM,Random Forest,and LSTM-SVR for conditional simulation of non-Gaussian measured fluctuating wind pressures[J]. Mechanical Systems and Signal Processing, 2022,178:109285.
Zhao J H, Zhang C Y, Min L, et al. Multi-source Remote Sensing soil moisture retrieval based on feature selection and GA-BP neural network[J]. Transactions of the Chinese Society of Agricultural Engineering, 2021, 37 (11):112-120.
Yang M H, Zhao X M. Estimation of soil total nitrogen content based on variable selection of vision-near-infrared spectroscopy[J]. Scientia Agricultura Sinica, 2014,(12):2374-2383.
[37]
Zhou P, Sudduth Kenneth A. Extraction of reflectance spectra features for estimation of surface,subsurface,and profile soil properties[J]. Computers & Electronics in Agriculture, 2022,196.
[38]
Song Y Q, Zhu A X. Spatial variability of selected metals using auxiliary variables in agricultural soils[J]. Catena, 2019,174:499-513.
Zhang Z L, Zuo X H, Liu F, et al. Spatial heterogeneity of soil available potassium and its influencing factors in the hilly region of western Chongqing[J]. Acta Pedologica Sinica, 20, 57(2):307-315.
Xu Q F, Yu R Y, Gou Y X, et al. Prediction accuracy of soil organic matter based on cloud genetic BP neural network in Huang-Huai-hai dry area[J]. Journal of China Agricultural University, 2021, 26 (4):167-173.
Xu J B, Song L S, Xia Z, et al. Spatial variation analysis of soil available phosphorus based on GARBF neural network[J]. Transactions of the Chinese Society of Agricultural Engineering, 2012,(16):158-165.
Xie M J, Wang Y, Kang Y, et al. Effects of artificial neural network and common Kriging interpolation method on spatial prediction accuracy of soil attributes[J]. Journal of Ecology and Rural Environment, 2021, 37 (7):934-942.
Jiang Y F, Guo X, Ye Y C, et al. Spatial distribution simulation of soil organic matter based on auxiliary variables and neural network model[J]. Resources and Environment in the Yangtze Basin, 2017, 26 (8):1150-1158.
Wei F, Liu J X, Xia L H, et al. Spatial prediction method of farmland soil organic matter in Weibei Arid Table-land of Shaanxi Province[J]. Environmental Science, 2022, 43 (2):1097-1107.