强对流天气人工智能应用训练基础数据集构建

Development of Basic Dataset of Severe Convective Weather for Artificial Intelligence Training

  • 摘要: 基于业务观测、历史灾情及互联网媒体等多源数据整编形成强对流天气人工智能应用训练基础数据集(Severe Convective Weather DataSet for AI application,SCWDS)。SCWDS包括2012—2019年中国大陆区域的雷暴、雷暴大风、短时强降水、冰雹及龙卷5种强对流天气,共184865个个例(站次),9256405个样本,每个样本包含强对流天气过程标注及对应时空窗口范围内的地面观测数据、探空数据、闪电定位数据、雷达基数据、卫星多通道数据和再分析产品等。雷暴、短时强降水、冰雹主要出现在6—8月,雷暴大风主要出现在4—5月,龙卷主要出现在6—8月和4月。短时强降水发生时间呈03:00—04:00(北京时,下同)和15:00—16:00时段双峰分布,雷暴、雷暴大风、冰雹、龙卷主要发生在13:00—19:00时段。雷暴主要出现在华南、江南及青藏高原、云贵高原,雷暴大风主要出现在华北北部及江南沿海,短时强降水主要出现在西南、华南、江南及黄淮江淮地区,冰雹主要出现在青藏高原、云贵高原及华北北部。SCWDS作为机器学习模型训练的基础数据,为强对流天气智能识别和预报应用提供数据支撑。

     

    Abstract: Deep learning shows great potential in severe convective weather nowcasting. The establishment of deep learning model is inseparable from a large number of training and learning, which is in terms of large capacity and high-quality dataset. Based on multi-source observations of CMA(China Meteorological Administration), disaster reports and internet media information, a dataset of severe convective weather for artificial intelligence training (SCWDS) is established. SCWDS is organized by severe convective weather events. It includes 184865 cases and each case is composed of several samples in the spatiotemporal window of the event. There are 9256405 samples including thunderstorm, gale, short-term heavy rain, hail and tornado in China from 2012 to 2019 in SCWDS. Each sample includes severe weather event annotation and corresponding spatiotemporal window of surface observations of temperature, precipitation, pressure, humidity, winds (average wind speed and maximum wind speed), radiosonde observations of temperature, dew point temperature, geopotential height and winds from 1000 to 1 hPa, lightning observations of intensity, radar volume scan data, visible, long wave infrared, water vapor and mid infrared channels of FY-2E, FY-2G and FY-2D nominal disk data, and environmental factors of ERA5 reanalysis data. Quality control and data cleaning are carried out, and all cases of time discontinuity, wrong logical relationship or caused by non-convective factors are eliminated. It shows that the thunderstorm, the short-term heavy rain and the hail mainly occur from April to September, especially from June to August in summer. However, the thunderstorm and the gale occur most frequently from April to May. The tornado occurs frequently from June to August and April. The thunderstorm, the gale and the hail show the same diurnal variation, and the high frequency period is concentrated between afternoon and evening. The daily cycle of the occurrence frequency of the short-term heavy rain presents a bimodal feature, and the high value period is in 0300-0400 BT and 1500-1600 BT. The occurrence of severe convective weather presents large spatial variability. The thunderstorm mainly distributes in South China, Jiangnan, the Tibet Plateau and the Yunnan-Guizhou Plateau where the frequency generally exceeds 40 times. The gale mainly distributes in the northern part of North China and Xinjiang, coastal areas in the south of the Yangtze with frequency of more than 10 times. The short-time heavy rain is mainly concentrated in southwest, South China, Jiangnan and Huanghuai Regions with frequency of more than 100 times. The hail is mainly distributed in the Tibet Plateau, the Yunnan-Guizhou Plateau and the northern part of North China where the frequency generally exceeds 6 times. The tornado mainly distributes in Jiangsu, Guangdong and Qiongzhou Straits.

     

/

返回文章
返回