应用科学学报 ›› 2024, Vol. 42 ›› Issue (3): 388-404.doi: 10.3969/j.issn.0255-8297.2024.03.002

• 信号与信息处理 • 上一篇    下一篇

基于K-means聚类与集成学习算法的小流域山洪灾害易发性评估

管筝1,2, 印涌强1,2, 张晓祥1,2, 陈跃红1,2   

  1. 1. 河海大学 水文水资源学院, 江苏 南京 210098;
    2. 河海大学 地理空间智能与流域科学研究中心, 江苏 南京 210098
  • 收稿日期:2023-06-29 发布日期:2024-06-06
  • 通信作者: 张晓祥,教授,博导,研究方向为地理信息系统、空间分析与建模、数字孪生流域、地理学思想史。E-mail:xiaoxiang@hhu.edu.cn E-mail:xiaoxiang@hhu.edu.cn
  • 基金资助:
    国家重点研发计划项目(No. 2023YFC3006701)资助

Estimating Flash Flood Disaster Susceptibility Based on K-means Clustering and Ensemble Learning Approaches

GUAN Zheng1,2, YIN Yongqiang1,2, ZHANG Xiaoxiang1,2, CHEN Yuehong1,2   

  1. 1. College of Hydrology and Water Resources, Hohai University, Nanjing 210098, Jiangsu, China;
    2. Center for Geospatial Intelligence and Watershed Science (CGIWaS), Hohai University, Nanjing 210098, Jiangsu, China
  • Received:2023-06-29 Published:2024-06-06

摘要: 为了更好地分析空间异质性对山洪灾害易发性评估的影响,建立了基于K-means聚类与集成学习算法的小流域山洪灾害易发性评估模型。首先,选取中国江西省12 338个小流域为研究区,对各时段不同频率降雨量指标进行K-means聚类。其次,以误差平方和与平均轮廓系数为聚类效果评价指标,将小流域分为2个类内聚集、类外分散的子集。最后,针对不同子集,从几何特征、环境特征以及降水特征3个方面选取平均坡度、形心高程、形状系数、最长汇流路径比降、地形湿度指数、归一化植被指数、距离河流最近距离、降雨量、洪峰模数以及汇流时间10个山洪影响因素,应用自适应增强算法与极致梯度提升算法进行山洪灾害易发性评估。研究发现,降水是导致山洪灾害的重要因素,江西省高降水区域山洪灾害易发程度普遍高于低降水区,同时省内高风险区分布较为分散,主要分布在东北区域与西北边缘区域。对聚类后两类相似小流域分别进行山洪易发性评估,接受者操作特征曲线下面积值均在0.90以上,精度较聚类前有所提高。聚类策略作为易发性评估模型的前驱过程,可以有效解决小流域异质性问题。

关键词: 空间异质性, K-means聚类, 集成学习, 自适应增强, 极致梯度提升, 山洪灾害

Abstract: In this paper, a model based on K-means clustering and ensemble learning approaches is developed to properly analyze the impact of spatial heterogeneity on the assessment of flash flood disaster susceptibility. Firstly, 12 338 catchments in Jiangxi Province, China, are selected as the study area, where the K-means clustering is performed on different frequency rainfall indicators for each period. Secondly, using the error sum of squares and mean contour coefficients as the clustering evaluation index, the small catchment datasets are divided into two subsets. Finally, for different subsets, ten flash flood influencing factors such as average slope, normalized difference vegetation index and rainfall are selected from geometric characteristics, environmental characteristics, and precipitation characteristics. The adaptive boosting (AdaBoost) and eXtreme gradient boosting (XGBoost) models are applied to evaluate the susceptibility of flash floods. It is found that precipitation is an important factor in flash floods disaster, and flash floods are more likely to occur in high precipitation areas in Jiangxi Province. Meanwhile, the distribution of high-risk areas is dispersed, mainly in the northeastern region and the northwestern edge. The area under the receiver operating characteristic curve (AUC) values of similar catchments could increase to 0.90 or above after clustering. The clustering model effectively addresses the heterogeneity of catchments as a precursor process for susceptibility assessment.

Key words: spatial heterogeneity, K-means clustering, ensemble learning, adaptive boosting (AdaBoost), eXtreme gradient boosting (XGBoost), flash floods disaster

中图分类号: