23–24 May 2026
地址:清华大学校内
Asia/Shanghai timezone

Evaluating the Performance of AI in Crisis Detection: A Multi-Scenario Hindcast of Extreme Precipitation Forecasts

Not scheduled
12m
地址:清华大学校内

地址:清华大学校内

北京市海淀区双清路30号
口头报告 安全科学与技术 安全科学与技术

Speaker

Feng Huang (Tsinghua University)

Abstract

Artificial Intelligence Weather Prediction (AIWP) models excel in global mean-error metrics, yet their efficacy in detecting low-probability, high-impact extreme events--critical for emergency response--remains under-examined. This study evaluates three leading models (GraphCast, FuXi, and Artificial Intelligence Forecasting System (AIFS)) against satellite observations and a numerical baseline across four diverse historical crises. Using a crisis-centric evaluation framework comprising Peak Amplitude Ratio (PAR), Spatial Correlation (SC), Root Mean Square Error (RMSE), volumetric Bias, and the Symmetric Extremal Dependence Index (SEDI), preliminary results reveal a systemic intensity deficit in AIWP models. While GFS maintains a PAR above 0.65 across most scenarios, AI models underestimate peak rainfall by over 90% and exhibit significant spatial displacement. These findings suggest that inherent statistical smoothing transforms catastrophic signals into benign forecasts. Consequently, over-reliance on current AIWP models for crisis detection may yield a false sense of security, potentially exacerbating rather than mitigating emergency vulnerabilities.

摘要

人工智能天气预报(AIWP)模型在全球平均误差指标上表现出色,但其在检测低概率、高影响极端事件(对应急响应至关重要)方面的有效性仍缺乏检验。本研究将三个领先模型(GraphCast、FuXi和人工智能预测系统(AIFS))与卫星观测和数值基准在四个不同的历史危机中进行评估。使用以危机为中心的评估框架,包括峰值振幅比(PAR)、空间相关性(SC)、均方根误差(RMSE)、体积偏差和对称极端依赖指数(SEDI),初步结果表明AIWP模型存在系统性的强度缺陷。虽然GFS在大多数情况下保持PAR高于0.65,但AI模型低估峰值降雨量超过90%,并表现出显著的空间位移。这些发现表明,固有的统计平滑将灾难性信号转化为良性预测。因此,过度依赖当前的AIWP模型进行危机检测可能会产生虚假的安全感,反而可能加剧而不是减轻应急脆弱性。

关键词 危机检测,极端降雨,气象大模型,回溯评估
Keywords Crisis Detection, Extreme Precipitation, AI Weather Prediction, Hindcast Evaluation.

Author

Feng Huang (Tsinghua University)

Presentation materials