数据仓库技术在天气预报决策中的应用探讨
Data Warehouse and Its Potential in Weather Forecast
-
摘要: 文章概述了数据仓库的概念和特点。讨论了数据仓库的数据存储、联机分析处理 (OLAP) 和数据挖掘 (DM) 要解决的主要技术问题, 侧重于数据仓库技术在天气预报领域中的应用。数据仓库技术将原始数据转换为便于分析的数据, 并增强了管理和使用历史数据及特种观测数据的能力, DM能够帮助预报员快速积累经验, OLAP使预报员的分析突破了过去固有框架的限制。文章针对天气预报决策特点提出以天气系统分析为主的数据聚集处理、在OLAP的多维分析之外增加比较分析、多元分析和相似分析功能等扩展, 还指出关联规则的挖掘是目前预报方法研究中值得尝试的新方法。Abstract: An important problem of current forecaster's forecasting platform is that although the system provides lot of data (over 2GB, several thousand weather fields data one day) forecasters only use a few of them (less than 1%) in operational forecast. And how to enable the system to have a flexible data management ability for forecasters to efficiently use historical data is another important issue. Data warehouse is a good solution to these problems. The data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. The data in the warehouse are processed ones called "analytic data" correspond to original operational data:"Subjects" are defined as objects to be analyzed in weather forecast, e.g. those concepts in forecaster's experiences. "Analytic data" are referred to as the real values corresponding to the "subjects" transformed from original operational data according to the definitions of subjects (for weather forecast, transformation based on the weather system is of the most importance). By creating a subject system from the concept set of forecaster's knowledge, defining the data transformation to change operational data into analytic data for each subject in the subject system, running the transformation program on operational data every day to get real-time analytic data and save them in a database, a data warehouse is built. Data warehouse is a data set of "analytic data". In this way, from concept to subject to analytic data, the data used in analyses directly match the concepts in forecaster's mind, and make data analysis more quickly and use more data in operational forecast.There are two important analysis tools in data warehouse. Data Mining (DM) is an exploring tool. The relationships among the subjects (i.e., concepts from forecasters) are automatically explored from the analytic data set in DM system. The resulting relationship is saved in the knowledge base of data warehouse, and reinforce forecaster's knowledge. Mining of association rules is noteworthiness because sometimes it is more reasonable than linear regression analysis. On-line analysis process (OLAP) is another analysis tool, an interactive validating tool. Forecasters use it to view data, validating relationships (including forecaster's guess and results from DM) and then make forecast decisions. The kernel technology is multi-dimensional analysis. Especially for weather forecast, "Compare analysis", "Multi-analysis" and "Analog analysis" based on multi-dimensional analysis are also used in OLAP. OLAP will be the main workbench for forecasters in data warehouse.In data warehouse, metadata is used. Data management and maintenance become easier and flexible, historical data and heterogeneous data such as special observation data, even Internet data, will be easier to use by applications. The bottleneck of traditional knowledge base system is knowledge acquirement. In data warehouse, forecasters put their concepts into subject system of data warehouse firstly, then get relationships between concepts from DM (or manually input some certain relationships) and validate them by OLAP. The knowledge of forecasters will be systematically used in forecast process, and the bottleneck problem will be moderated.
-
Key words:
- weather forecast;
- data warehouse;
- OLAP;
- data mining
-
[1] Inmon W H. Building the Data Warw house. John Wiley & Sons Inc, 1993. [2] Codd E F, Codd S B, Salley C T. Providing OLAP (OnLine Analysis Processing) to user analysis. PC World, 1993. http://www.doc88.com/p-1834507947730.html [3] Michael Corey.施平安译. Oricle 8i数据仓库.北京:机械工业出版社, 2001. [4] 赵俊三, 赵耀龙. GIS发展的最新趋势及其应用前景.测绘工程, 2000, 9 (2): 21-25. http://www.cnki.com.cn/Article/CJFDTOTAL-CHGC200002004.htm [5] 詹小国.长江防洪减灾空间数据仓库初步研究.人民长江, 2001, 32 (4): 19-21. http://www.cnki.com.cn/Article/CJFDTOTAL-RIVE200104008.htm [6] 王俊.基于空间数据仓库的城市交通规划研究.西北大学学报 (自然科学版), 2000, 30 (3): 201-204. http://www.cnki.com.cn/Article/CJFDTOTAL-XBDZ200003006.htm [7] 李满春, 高丽, 陈刚.空间信息数字图书馆初论.科技通报, 2002, 18 (3): 177-183. http://www.cnki.com.cn/Article/CJFDTOTAL-KJTB200203001.htm [8] 邹逸江.空间数据仓库研究综述.测绘学院学报, 2002, 19 (4): 287-289. http://www.cnki.com.cn/Article/CJFDTOTAL-JFJC200204014.htm [9] 石磊, 石云, 刘欲晓, 等.基于影响域的OLAM模型的研究.郑州大学学报 (自然科学版), 2000, 32 (2): 16-20. http://www.cnki.com.cn/Article/CJFDTOTAL-ZZDZ200002004.htm [10] 王珊.数据仓库技术与联机分析处理.北京:科学出版社, 1998. [11] 谭念龙.空间数据存储技术及其应用.微电子学与计算机, 2002, (1): 15-18. http://www.cnki.com.cn/Article/CJFDTOTAL-WXYJ200201004.htm [12] Metadata Standard (incl ET-IDM Report). ICT-ISS 2002/Doc.7 (1), http://www.wmo.ch/web/www/TEM/ICT-ISS2002/7(1)-metadata.doc, 2002. [13] WMO Core Metadata Implementation for Climate Data. ETIDM-IV/Doc.3-1 (7), http://www.wmo.int/web/www/WDM/ET-IDM-4/Doc-3-1(7).doc, 2004. [14] 秦亮曦, 史忠植. SFP-Max基于排序FP树的最大频繁模式挖掘算法.计算机研究与发展, 2005, 42 (2): 217-223. http://www.cnki.com.cn/Article/CJFDTOTAL-JFYZ200502005.htm [15] 黄文玲, 陈德军.灰色趋势灾变预测及其在数据挖掘中的应用.华中科技大学学报 (自然科学版), 2005, 33 (1): 55-57. http://www.cnki.com.cn/Article/CJFDTOTAL-HZLG200501018.htm [16] 尹群, 王丽珍, 田启明.一种基于概率的加权关联规则挖掘算法.计算机应用, 2005, 25 (4): 805-807. http://www.cnki.com.cn/Article/CJFDTOTAL-JSJY20050400N.htm [17] 刘明吉.基于协同演化的文本特征获取算法.计算机工程, 2005, 31 (4): 85-87. http://www.cnki.com.cn/Article/CJFDTOTAL-JSJC20050400U.htm [18] 王婉湘.一种基于一类支持向量机的时序异常检测算法.微型机与应用, 2005, (1): 55-57. http://www.cnki.com.cn/Article/CJFDTOTAL-WXJY200501017.htm [19] 陈莉, 焦李成.基于自适应聚类的数据预处理算法Ⅰ.计算机应用与软件, 2005, 22 (3): 28-29. http://www.cnki.com.cn/Article/CJFDTOTAL-JYRJ200503011.htm [20] 钱少华, 蔡勇, 钱雪忠.聚类分析中图像数据量化比较的研究.计算机应用与软件, 2005, 22 (3): 93-94. http://www.cnki.com.cn/Article/CJFDTOTAL-JYRJ200503036.htm [21] 李宁, 乐琦.决策树算法及其常见问题的解.计算机与数字工程, 2005, 33 (3): 60-64. http://www.cnki.com.cn/Article/CJFDTOTAL-JSSG200503017.htm [22] 颜宏文, 马瑞, 龙际珍, 等.数据挖掘中判定树算法SLIQ的设计与应用.计算机工程, 2005, 36 (6): 60-62. http://www.cnki.com.cn/Article/CJFDTOTAL-JSJC20050600L.htm [23] 赖邦传, 陈晓红.一种基于决策树的多属性分类方法.计算机工程, 2005, 31 (5): 88-89. http://www.cnki.com.cn/Article/CJFDTOTAL-JSJC20050500T.htm [24] 陈燕, 耿国华, 郑建国.一种改进的基于密度的聚类算法.微机发展, 2005, 15 (3): 17-19. http://www.cnki.com.cn/Article/CJFDTOTAL-WJFZ200503006.htm [25] 白莉媛, 胡声艳, 刘素华.一种基于模拟退火和遗传算法的模糊聚类方法.计算机工程与应用, 2005, (9): 56-58. http://www.cnki.com.cn/Article/CJFDTOTAL-JSGG20050900H.htm [26] 李康顺, 李元香, 滕冲, 等.遗传算法在数据挖掘中的应用.计算机工程与应用, 2005, (9): 219-223. http://www.cnki.com.cn/Article/CJFDTOTAL-JSGG20050901X.htm [27] 宋中山, 曾广平.基于XML的Web数据挖掘技术.中南民族大学学报 (自然科学版), 2005, 24 (1): 64-67. http://cdmd.cnki.com.cn/Article/CDMD-10704-2009262614.htm [28] 蒲晓湘, 刘文才.联机分析挖掘 (OLAM) 技术的现状与发展.重庆大学学报, 2004, 27 (3): 36-40. http://www.cnki.com.cn/Article/CJFDTOTAL-FIVE200403008.htm [29] 周海燕, 王家耀, 吴升.空间数据挖掘技术及其应用.测绘通报, 2002, (2): 11-13. http://www.cnki.com.cn/Article/CJFDTOTAL-CHTB200202004.htm