Comprehensive Consistency Method of Data Quality Controlling with Its Application to Daily Temperature
-
摘要: 由于历史逐日气温资料在气候分析、气候变化研究中的基础性作用,其数据质量状况日益受到关注。利用邻近参考站日平均气温、最高气温和最低气温资料及线性回归模型,设计了基于线性回归数据估计方法的质量检查算法,该算法同时包含了时间一致性和空间一致性两种检查方法。通过数据植入误差检测以及与单一空间回归检查方法的比较,该算法的错误数据检测性能较高,可检测出与正确日气温数据相差3℃左右的可疑值。在该算法的基础上,研制了综合一致性数据质量控制方法,该方法具有以下特点:第一类错误发生率较低;保持了时间、内部和空间一致性的逻辑关系;参考了天气因素。因此,与一般的数据质量控制方法相比,综合一致性数据质量控制方法具有较高的错误数据检测性能。经过在华中区域湖北、湖南和河南三省251个站1961—2009年逐日气温资料的应用,取得了较好效果。各要素奇异值检出率平均气温为0.001%,最高气温为0.05%,最低气温为0.04%。Abstract: Due to the historical daily temperature data playing an important role in climate analysis and climate change research, the data quality is attached more importance. At present the daily temperature data are checked for quality control using the traditional methods in China, lacking a systematic and comprehensive method to pick up the outliers data hidden in the historical temperature data. These error data in the daily temperature affect data application, therefore, it's necessary to carry out the research of new quality control method.Using linear regression model and historical daily temperature (average temperature, maximum temperature and minimum temperature) data of the neighbouring stations in the same period, a quality check algorithm based on linear regression estimation method is designed, which includes both time consistency check and spatial consistency check in quality control of meteorological observational data. To further enhance the detection performance of data quality check, a comprehensive consistency check method is developed based on this algorithm, which adds internal consistency check that refers the variation of related meteorological elements such as daily temperature (average temperature, maximum temperature and minimum temperature), precipitation and sunshine duration to check data quality.Using the data seeded errors check test and compared with spatial regression test, the method of linear regression data quality control algorithm has higher error data check performance. The algorithm can detect suspicious data that is about 3℃ difference from the correct value on the temperature.Through data quality control practices and analysis on historical data, the comprehensive consistency check method has the following advantages: The flagged rates of Type Ⅰ errors are lower, thus reducing false detection rate of that the correct data flagged as error data; the logical relationship are kept with time consistency, internal consistency, and spatial consistency in data quality control process, and these three methods of checking the consistency of data quality are as a whole at the same time; the weather factors are referred, thus reducing the impact on data quality of small-scale weather phenomena which can flag data incorrectly. Therefore, the method of comprehensive consistency data quality control, which compared to the traditional data quality control method, has higher error detection performance.The algorithm achieves good progress on the applications of daily temperature data from 251 weather stations from 1961 to 2009 in Hubei, Hunan and Henan provinces. Detection of outliers in the average temperature is 0.001%, that in the maximum temperature is 0.05%, and that in the minimum temperature is 0.04%.
-
表 1 各种植入错误数据的质控参数值|fi|≥3所占比例 (线性回归法)
Table 1 The percentage of seeded errors data for quality control parameter more than 3(using linear regression method)
植入误差/℃ 平均气温/% 最高气温/% 最低气温/% 0.0 0.4 0.5 0.4 1.2 53.8 18.1 15.6 1.4 69.6 28.1 25.1 1.6 82.2 41.8 39.1 1.8 89.4 53.7 50.8 2.0 93.8 65.0 63.1 2.2 96.4 74.6 73.0 2.4 97.9 81.6 80.1 2.6 98.8 87.2 85.6 2.8 99.2 91.2 90.1 3.0 99.5 94.3 92.4 注:植入误差为0.0℃时, 表示实际观测数据|fi|≥3的比例。 表 2 各种植入错误数据的质量控制参数值|fi|≥3所占比例 (空间回归法)
Table 2 The percentage of seeded errors data for quality control parameter more than 3(using spatial regression test)
植入误差/℃ 平均气温/% 最高气温/% 最低气温/% 0.0 0.7 0.6 0.9 2.2 65.4 22.4 33.2 2.4 72.0 30.6 40.6 2.6 77.0 39.2 47.7 2.8 80.9 48.0 54.3 3.0 84.1 56.1 60.2 3.2 86.8 63.4 65.4 3.4 89.3 69.4 69.8 3.6 91.4 74.6 74.0 3.8 93.3 78.6 77.5 4.0 94.9 81.9 80.7 注:植入误差为0.0℃时, 表示实际观测数据|fi|≥3的比例。 表 3 1992年5月6日郧西站及其邻近站日气象要素
Table 3 The meteorological elements of Yunxi Station and its neighboring stations on 6 May 1992
气象要素 郧西 邻近参考站 竹溪 郧县 竹山 房县 老河口 平均气温/℃ 23.2 21.2 20.0 21.9 20.4 18.6 最高气温/℃ 27.5 24.6 22.0 24.8 23.8 22.9 最低气温/℃ 20.2 19.8 18.1 20.6 18.4 17.6 日照时数/h 2.3 0.0 0.0 0.0 0.0 0.0 降水量/mm 0.0 5.0 7.5 20.2 53.1 37.5 表 4 内部一致性质量控制方法标注规则
Table 4 The flagged rules of the internal consistency data check method
被检要素
质控参数参考要素
质控参数修正前
质控码修正后
质控码|fi|≥7 |fi|≥3 F=4 F=1 5≤|fi| < 7 2≤|fi| < 3 F=3 F=1 3≤|fi| < 5 1≤|fi| < 2 F=2 F=1 |fi| < 3 |fi| < 1 F=0 F=0 表 5 1985年12月22日云梦及邻近参考站气象要素
Table 5 The meteorological elements of Yunmeng Station and its neighboring stations on 22 Dec 1985
气象要素 云梦 邻近参考站 京山 安陆 应城 孝感 汉川 平均气温/℃ 0.8 0.6 0.2 0.9 1.3 1.9 最高气温/℃ 9.4 6.9 5.3 5.8 4.8 5.4 最低气温/℃ -1.8 -2.5 -1.4 -1.5 -1.4 -0.9 日照时数/h 0.0 1.6 0.0 1.2 0.0 0.0 降水量/mm 0.0 0.0 0.0 0.0 0.0 0.0 -
[1] 任芝花, 赵平, 张强, 等.适用于全国自动站小时降水资料的质量控制方法.气象, 2010, 36(7): 123-132. doi: 10.7519/j.issn.1000-0526.2010.07.019 [2] 王海军, 杨志彪, 杨代才, 等.自动气象站实时资料自动质量控制方法及其应用.气象, 2007, 33(10): 102-109. doi: 10.7519/j.issn.1000-0526.2007.10.015 [3] 陶士伟, 仲跻芹, 徐枝芳, 等.地面自动站资料质量控制方案及应用.高原气象, 2009, 28(5):1202-1209. http://www.cnki.com.cn/Article/CJFDTOTAL-GYQX200905029.htm [4] 封秀燕, 何志军, 王荷平, 等.自动气象站实时资料质量控制开放式平台设计.应用气象学报, 2010, 21(4): 506-512. doi: 10.11898/1001-7313.20100415 [5] 廖捷, 熊安元.我国飞机观测气象资料概况及质量分析.应用气象学报, 2010, 21(2):206-213. doi: 10.11898/1001-7313.20100210 [6] 任芝花, 刘小宁, 杨文霞.极端异常气象资料的综合性质量控制与分析.气象学报, 2005, 63(4):526-533. doi: 10.11676/qxxb2005.052 [7] 任芝花, 许松, 孙化南, 等.全球地面天气报历史资料质量检查与分析.应用气象学报, 2006, 17(4):412-420. doi: 10.11898/1001-7313.20060404 [8] 任芝花, 熊安元, 邹凤玲.中国地面月气候资料质量控制方法的研究.应用气象学报, 2007, 18(4):516-523. doi: 10.11898/1001-7313.20070412 [9] Hubbard K G, You J S. Sensitivity analysis of quality assurance using the spatial regression approach—A case study of the maximum/minimum air temperature. J Atmos Oceanic Technol, 2005, 22: 1520-1530. doi: 10.1175/JTECH1790.1 [10] Hubbard K G, Nathaniel B G, You J S, et al. An improved QC process for temperature in the daily cooperative weather observations. J Atmos Oceanic Technol, 2007, 24: 206-213. doi: 10.1175/JTECH1963.1 [11] You J S, Kenneth G H. Quality control of weather data during extreme events. J Atmos Oceanic Technol, 2006, 23: 184-197. doi: 10.1175/JTECH1851.1 [12] Durre I, Matthew J M, Byron E G, et al. Comprehensive automated quality assurance of daily surface observations. J Appl Meteor Climatol, 2010, 49: 1615-1633. doi: 10.1175/2010JAMC2375.1 [13] Graybeal D Y, Arthur T D, Keith L E. Complex quality assurance of historical hourly surface airways meteorological data. J Atmos Oceanic Technol, 2004, 21: 1156-1169. doi: 10.1175/1520-0426(2004)021<1156:CQAOHH>2.0.CO;2 [14] Graybeal D Y, Arthur T D, Keith L E. Improved quality assurance for historical hourly temperature and humidity: Development and application to environmental analysis.J Appl Meteor, 2004, 43:1722-1735. doi: 10.1175/JAM2162.1 [15] 任芝花, 熊安元.地面自动站观测资料三级质量控制业务系统的研制.气象, 2007, 33(1):19-24. doi: 10.7519/j.issn.1000-0526.2007.01.003 [16] 刘小宁, 鞠晓慧, 范邵华.空间回归检验方法在气象资料质量检验中的应用.应用气象学报, 2006, 17(1):37-42. doi: 10.11898/1001-7313.20060106 [17] 黄嘉佑.气象统计分析与预报方法 (第三版).北京:气象出版社, 2004: 36-50. [18] 王海军, 涂诗玉, 陈正洪.日气温数据缺测的插补方法试验与误差分析.气象, 2008, 34(7):83-91. doi: 10.7519/j.issn.1000-0526.2008.07.012 [19] Fiebrich C A, Kenneth C C. The impact of unique meteorological phenomena detected by the Oklahoma Mesonet and ARS Micronet on automated quality control. Bull Amer Meteor Soc, 2001, 82: 2173-2187. doi: 10.1175/1520-0477(2001)082<2173:TIOUMP>2.3.CO;2