Development of Basic Dataset of Severe Convective Weather for Artificial Intelligence Training
-
摘要: 基于业务观测、历史灾情及互联网媒体等多源数据整编形成强对流天气人工智能应用训练基础数据集(Severe Convective Weather DataSet for AI application,SCWDS)。SCWDS包括2012—2019年中国大陆区域的雷暴、雷暴大风、短时强降水、冰雹及龙卷5种强对流天气,共184865个个例(站次),9256405个样本,每个样本包含强对流天气过程标注及对应时空窗口范围内的地面观测数据、探空数据、闪电定位数据、雷达基数据、卫星多通道数据和再分析产品等。雷暴、短时强降水、冰雹主要出现在6—8月,雷暴大风主要出现在4—5月,龙卷主要出现在6—8月和4月。短时强降水发生时间呈03:00—04:00(北京时,下同)和15:00—16:00时段双峰分布,雷暴、雷暴大风、冰雹、龙卷主要发生在13:00—19:00时段。雷暴主要出现在华南、江南及青藏高原、云贵高原,雷暴大风主要出现在华北北部及江南沿海,短时强降水主要出现在西南、华南、江南及黄淮江淮地区,冰雹主要出现在青藏高原、云贵高原及华北北部。SCWDS作为机器学习模型训练的基础数据,为强对流天气智能识别和预报应用提供数据支撑。Abstract: Deep learning shows great potential in severe convective weather nowcasting. The establishment of deep learning model is inseparable from a large number of training and learning, which is in terms of large capacity and high-quality dataset. Based on multi-source observations of CMA(China Meteorological Administration), disaster reports and internet media information, a dataset of severe convective weather for artificial intelligence training (SCWDS) is established. SCWDS is organized by severe convective weather events. It includes 184865 cases and each case is composed of several samples in the spatiotemporal window of the event. There are 9256405 samples including thunderstorm, gale, short-term heavy rain, hail and tornado in China from 2012 to 2019 in SCWDS. Each sample includes severe weather event annotation and corresponding spatiotemporal window of surface observations of temperature, precipitation, pressure, humidity, winds (average wind speed and maximum wind speed), radiosonde observations of temperature, dew point temperature, geopotential height and winds from 1000 to 1 hPa, lightning observations of intensity, radar volume scan data, visible, long wave infrared, water vapor and mid infrared channels of FY-2E, FY-2G and FY-2D nominal disk data, and environmental factors of ERA5 reanalysis data. Quality control and data cleaning are carried out, and all cases of time discontinuity, wrong logical relationship or caused by non-convective factors are eliminated. It shows that the thunderstorm, the short-term heavy rain and the hail mainly occur from April to September, especially from June to August in summer. However, the thunderstorm and the gale occur most frequently from April to May. The tornado occurs frequently from June to August and April. The thunderstorm, the gale and the hail show the same diurnal variation, and the high frequency period is concentrated between afternoon and evening. The daily cycle of the occurrence frequency of the short-term heavy rain presents a bimodal feature, and the high value period is in 0300-0400 BT and 1500-1600 BT. The occurrence of severe convective weather presents large spatial variability. The thunderstorm mainly distributes in South China, Jiangnan, the Tibet Plateau and the Yunnan-Guizhou Plateau where the frequency generally exceeds 40 times. The gale mainly distributes in the northern part of North China and Xinjiang, coastal areas in the south of the Yangtze with frequency of more than 10 times. The short-time heavy rain is mainly concentrated in southwest, South China, Jiangnan and Huanghuai Regions with frequency of more than 100 times. The hail is mainly distributed in the Tibet Plateau, the Yunnan-Guizhou Plateau and the northern part of North China where the frequency generally exceeds 6 times. The tornado mainly distributes in Jiangsu, Guangdong and Qiongzhou Straits.
-
图 1 一次强对流天气过程的空间窗定义及所包含观测数据示例
(蓝色圆圈表示200 km半径范围,红色圆圈表示500 km半径范围,填色表示FY-2E气象卫星长波红外通道亮温)
Fig. 1 An example of spatial window definition and corresponding observation composition for a severe convective weather event
(the blue circle for 200 km and the red circle for 500 km are spatial windows, the shaded denotes FY-2E long-wave infrared channel brightness temperature)
表 1 强对流天气类型定义
Table 1 Definition of severe convective weather events
强对流天气类型 强对流天气定义 雷暴 积雨云云中、云间或云地之间产生的放电现象,表现为闪电并有雷声,有时亦可只闻雷声而不见闪电 雷暴大风 受强对流云团影响,瞬时风风速达到或超过17.0 m·s-1并伴有雷电的大风天气[22] 短时强降水 由对流性天气系统造成的短时强降水天气过程,该过程中至少存在1个连续60 min累积降水量不小于20 mm的时段,过程开始于第1个60 min累积降水量不小于20 mm时段的开始分钟,结束于最后1个60 min累积降水量不小于20 mm时段的结束分钟 冰雹 直径不小于2 mm的坚硬球状、锥状或不规则形状的固态降水,常伴随雷暴出现 龙卷 最为猛烈的对流天气现象之一,它是一种水平尺度很小但破坏力很大的小尺度天气系统,是和强对流云相伴出现的具有垂直轴的小范围强烈涡旋,上部是积状云,下部是下垂的漏斗状云柱,底部直径一般为几十米到数百米,不超过800 m,移动距离几百米到几千米,产生的最大地面风速可达140 m·s-1 表 2 强对流天气过程对应的天气条件数据时间窗和空间窗定义(时间窗的负数表示过程开始前时长,正数表示过程结束后时长)
Table 2 Temporal and spatial window definition of weather condition for severe convective weather events (negative represents hours before the event beginning,positive represents hours after the event ending)
表征天气条件的数据种类 空间窗定义 时间窗定义 地面观测数据 覆盖以天气过程发生地为中心200 km为半径的圆形范围的国家级台站 [-2 h,+2 h] 探空数据 覆盖以天气过程发生地为中心500 km为半径的圆形范围的探空站 [-24 h,+2 h] 闪电定位数据 覆盖以天气过程发生地为中心200 km为半径的圆形范围的闪电定位数据 [-2 h,+2 h] 雷达基数据 覆盖以天气过程发生地为中心200 km为半径的圆形范围的多普勒天气雷达基数据 [-2 h,+2 h] 卫星多通道数据 覆盖以天气过程发生地为中心1000 km为边长的正方形范围的静止卫星多通道数据 [-2 h,+2 h] 再分析产品 中国范围的再分析产品 [-2 h,+2 h] 表 3 强对流天气过程数据清洗方法描述
Table 3 Description of data cleaning methods for severe convective weather events
数据清洗方法 数据清洗方法描述 不完整数据清洗 ①时空属性缺失,无法通过统计方法进行补充,按缺失数据剔除处理;②物理强度属性缺失,按空间一致性统计方法,利用最邻近台站同一时间观测的强度属性值进行补充 不一致数据清洗 针对由于观测时期、观测来源不同造成的数据量纲、数据格式等表达不一致的数据进行统一化处理 不连续数据清洗 针对强对流天气过程持续时间过短和同一地点两次强对流天气过程时间间隔过短数据的清洗处理 逻辑关系错误数据清洗 针对属性值违背业务规定逻辑关系的数据清洗 非对流性天气过程清洗 针对非对流性因素影响的天气过程的数据清洗 互联网数据验证 以2012—2019年《中国气象灾害年鉴》[35]、国家级地面气象观测站观测数据及中国气象局灾情直报系统多个来源数据为参考基准,若天气过程被上述数据源记录,则认为是真实记录 -
[1] 唐文苑, 周庆亮, 刘鑫华, 等. 国家级强对流天气分类预报检验分析.气象, 2017, 43(1):67-76. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXX201701007.htmTang W Y, Zhou Q L, Liu X H, et al. Anlyisis on verification of national severe convective weather categorical forecasts. Meteor Mon, 2017, 43(1): 67-76. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXX201701007.htm [2] Hitchens N M, Brooks H E. Evaluation of the storm prediction center's day 1 convective outlooks. Wea Forecasting, 2012, 27(6): 1580-1585. doi: 10.1175/WAF-D-12-00061.1 [3] Gagne D J, McGovern A, Haupt S E, et al. Storm-based probabilistic hail forecasting with machine learning applied to convection-allowing ensembles. Wea Forecasting, 2017, 32(5): 1819-1840. doi: 10.1175/WAF-D-17-0010.1 [4] 孙健, 曹卓, 李恒, 等. 人工智能技术在数值天气预报中的应用. 应用气象学报, 2021, 32(1): 1-11. doi: 10.11898/1001-7313.20210101Sun J, Cao Z, Li H, et al. Application of artificial intelligence technology to numerical weather prediction. J Appl Meteor Sci, 2021, 32(1): 1-11. doi: 10.11898/1001-7313.20210101 [5] Perler D, Marchand O. A study in weather model output postprocessing: Using the boosting method for thunderstorm detection. Wea Forecasting, 2009, 24(1): 211-222. doi: 10.1175/2008WAF2007047.1 [6] Lagerquist R, McGovern A, Smith T. Machine learning for real-time prediction of damaging straight-line convective wind. Wea Forecasting, 2017, 32(6): 2175-2193. doi: 10.1175/WAF-D-17-0038.1 [7] Marzban C, Witt A. A bayesian neural network for severe-hail size prediction. Wea Forecasting, 2001, 16(5): 600-610. doi: 10.1175/1520-0434(2001)016<0600:ABNNFS>2.0.CO;2 [8] Marzban C, Stumpf G J. A neural network for tornado prediction based on Doppler radar-derived attributes. J Appl Meteor, 1996, 35(5): 617-626. doi: 10.1175/1520-0450(1996)035<0617:ANNFTP>2.0.CO;2 [9] Mecikalski, John R, Williams, et al. Probabilistic 0-1-h convective initiation nowcasts that combine geostationary satellite observations and numerical weather prediction model data. J Appl Meteorol Climatol, 2015, 54(5): 1039-1059. doi: 10.1175/JAMC-D-14-0129.1 [10] Shi X J, Chen Z R, Wang H, et al. Convolutional LSTM network: A Machine Learning Approach for Precipitation Nowcasting//Proc 28th Int Conf on NIPS, 2015: 802-810. [11] 韩丰, 龙明盛, 李月安, 等. 循环神经网络在雷达临近预报中的应用. 应用气象学报, 2019, 30(1): 61-69. doi: 10.11898/1001-7313.20190106Han F, Long M S, Li Y A, et al. The application of recurrent neural network to nowcasting. J Appl Meteor Sci, 2019, 30(1): 61-69. doi: 10.11898/1001-7313.20190106 [12] Shi X J, Gao Z H, Lausenl L, et al. Deep Learning for Precipitation Nowcasting: A Benchmark and New Model//Proc 31st Conf on NIPS, 2017: 5617-5627. [13] Jing J, Li Q, Peng X. MLC-LSTM: Exploiting the spatiotemporal correlation between multi-level weather radar echoes for echo sequence extrapolation. Sensors, 2019, 19(18): 3988-4008. doi: 10.3390/s19183988 [14] Zhou K H, Zheng Y, Li B, et al. Forecasting different types of convective weather: A deep learning approach. J Meteor Res, 2019, 33(5): 797-809. doi: 10.1007/s13351-019-8162-6 [15] Su H, Deng J, Li F F. Crowdsourcing Annotations for Visual Object Detection. AAAI Human Computation Workshop, 2012. http://www.researchgate.net/publication/291249011_Crowdsourcing_annotations_for_visual_object_detection [16] 刘伯骏, 张亚萍, 黎中菊, 等. 一种基于地面实况的降雹风暴体客观标识方法. 应用气象学报, 2021, 32(1): 78-90. doi: 10.11898/1001-7313.20210107Liu B J, Zhang Y P, Li Z J, et al. An objective hailstorm labeling algorithm based on ground observation. J Appl Meteor Sci, 2021, 32(1): 78-90. doi: 10.11898/1001-7313.20210107 [17] Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge. Int J Comput Vis, 2015, 115(3): 211-252. doi: 10.1007/s11263-015-0816-y [18] Dai A, Chang A X, Savva M, et al. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes//IEEE Conf on CVPR, 2017: 2432-2443. [19] Hersbach H, Bell B, Berrisford P, et al. The ERA5 global reanalysis. Q J R Meteorol Soc, 2020, 146(730): 1999-2049. doi: 10.1002/qj.3803 [20] 郑永光, 周康辉, 盛杰, 等. 强对流天气监测预报预警技术进展. 应用气象学报, 2015, 26(6): 641-657. doi: 10.11898/1001-7313.20150601Zheng Y G, Zhou K H, Sheng J, et al. Advances in techniques of monitoring, forecasting and warning of severe convective weather. J Appl Meteor Sci, 2015, 26(6): 641-657. doi: 10.11898/1001-7313.20150601 [21] 中国气象局. 地面气象观测规范. 北京: 气象出版社, 2003.China Meteorological Administration. Specifications for Surface Meteorological Observation. Beijing: China Meteorological Press, 2003. [22] 王黉, 李英, 宋丽莉, 等. 川藏地区雷暴大风活动特征和环境因子对比. 应用气象学报, 2020, 31(4): 435-446. doi: 10.11898/1001-7313.20200406Wang H, Li Y, Song L L, et al. Comparison of characteristics and environmental factors of thunderstorm gales over the Sichuan-Tibet Region. J Appl Meteor Sci, 2020, 31(4): 435-446. doi: 10.11898/1001-7313.20200406 [23] 王伯民. 基本气象资料质量控制综合判别法的研究. 应用气象学报, 2004, 15(增刊I): 50-59. https://www.cnki.com.cn/Article/CJFDTOTAL-YYQX2004S1008.htmWang B M. A study on synthetic differentiation method for basic meteorological data quality control. J Appl Meteor Sci, 2004, 15(Suppl I): 50-59. https://www.cnki.com.cn/Article/CJFDTOTAL-YYQX2004S1008.htm [24] 任芝花, 熊安元, 邹凤玲. 中国地面月气候资料质量控制方法的研究. 应用气象学报, 2007, 18(4): 516-523. http://qikan.camscma.cn/article/id/20070481Ren Z H, Xiong A Y, Zou F L. The quality control of surface monthly climate data in China. J Appl Meteor Sci, 2007, 18(4): 516-523. http://qikan.camscma.cn/article/id/20070481 [25] 任芝花, 张志富, 孙超, 等. 全国自动气象站实时观测资料三级质量控制系统研制. 气象, 2015, 41(10): 1268-1277. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXX201510010.htmRen Z H, Zhang Z F, Sun C, et al. Development of three step quality control system of real time observation data from AWS in China. Meteor Mon, 2015, 41(10): 1268-1277. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXX201510010.htm [26] 王海军, 刘莹. 综合一致性质量控制方法及其在气温中的应用. 应用气象学报, 2012, 23(1): 69-76. http://qikan.camscma.cn/article/id/20120108Wang H J, Liu Y. Comprehensive consistency method of data quality controlling with its application to daily temperature. J Appl Meteor Sci, 2012, 23(1): 69-76. http://qikan.camscma.cn/article/id/20120108 [27] 周尚河. 全国高空资料质量控制和建库方法的研究. 应用气象学报, 2000, 11(3): 364-370. http://qikan.camscma.cn/article/id/20000353Zhou S H. Quality control and technical method for producing data set for upper-air data in China. J Appl Meteor Sci, 2000, 11(3): 364-370. http://qikan.camscma.cn/article/id/20000353 [28] 阮新, 熊安元, 胡开喜, 等. 中国历史探空资料部分等压面位势高度错误订正. 应用气象学报, 2015, 26(3): 257-267. doi: 10.11898/1001-7313.20150301Ruan X, Xiong A Y, Hu K X, et al. Correcting geopotential height errors of some mandatory levels of Chinese historic radiosonde observations. J Appl Meteor Sci, 2015, 26(3): 257-267. doi: 10.11898/1001-7313.20150301 [29] 文浩, 刘黎平, 张持岸, 等. 新一代天气雷达地物回波及电磁干扰质控算法业务试运行评估. 气象科学, 2016, 36(6): 789-799. https://www.cnki.com.cn/Article/CJFDTOTAL-QXKX201606009.htmWen H, Liu L P, Zhang C A. Operational evaluation of radar data quality control for ground clutter and electromagnetic interference. J Meteor Sci, 2016, 36(6): 789-799. https://www.cnki.com.cn/Article/CJFDTOTAL-QXKX201606009.htm [30] 刘黎平, 吴林林, 杨引明. 基于模糊逻辑的分步式超折射地物回波识别方法的建立和效果分析. 气象学报, 2007, 65(2): 252-260. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXB200702010.htmLiu L P, Wu L L, Yang Y M. Development of fuzzy-logical two-step ground clutter detection algorithm. Acta Meteor Sinica, 2007, 65(2): 252-260. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXB200702010.htm [31] 谭学, 刘黎平, 范思睿. 新一代天气雷达海浪回波特征分析和识别方法研究. 气象学报, 2013, 71(5): 962-975. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXB201305015.htmTan X, Liu L P, Fan S R. Statistical characteristics of sea clutter and identification of sea clutter with CINRAD. Acta Meteor Sinica, 2013, 71(5): 962-975. https://www.cnki.com.cn/Article/CJFDTOTAL-QXXB201305015.htm [32] 冷亮, 黄兴友, 杨洪平, 等. 多普勒雷达晴空回波识别与应用. 气象科技, 2012, 40(4): 24-31. https://www.cnki.com.cn/Article/CJFDTOTAL-QXKJ201204005.htmLeng L, Huang X Y, Yang H P, et al. Recognition and application of doppler weather radar clear air echoes. Meteor Sci Technol, 2012, 40(4): 24-31. https://www.cnki.com.cn/Article/CJFDTOTAL-QXKJ201204005.htm [33] 肖艳姣, 万玉发, 王珏, 等. 一种自动多普勒雷达速度退模糊算法研究. 高原气象, 2012, 31(4): 1119-1128. https://www.cnki.com.cn/Article/CJFDTOTAL-GYQX201204028.htmXiao Y J, Wan Y F, Wang J, et al. Study of an automated Doppler radar velocity dealiasing algorithm. Plateau Meteor, 2012, 31(4): 1119-1128. https://www.cnki.com.cn/Article/CJFDTOTAL-GYQX201204028.htm [34] 江源. 天气雷达观测资料质量控制方法研究及其应用. 北京: 中国气象科学研究院, 2013.Jiang Y. Meteorological Radar Data Quality Control Study and Application. Beijing: Chinese Academy of Meteorological Sciences, 2013. [35] 中国气象局. 中国气象灾害年鉴(2013-2019). 北京: 气象出版社, 2013-2019.China Meteorological Administration. Yearbook of Meteorological Disasters in China(2013-2019). Beijing: China Meteorological Press, 2013-2019.