Abstract:
An important problem of current forecaster's forecasting platform is that although the system provides lot of data (over 2GB, several thousand weather fields data one day) forecasters only use a few of them (less than 1%) in operational forecast. And how to enable the system to have a flexible data management ability for forecasters to efficiently use historical data is another important issue. Data warehouse is a good solution to these problems. The data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. The data in the warehouse are processed ones called "analytic data" correspond to original operational data:"Subjects" are defined as objects to be analyzed in weather forecast, e.g. those concepts in forecaster's experiences. "Analytic data" are referred to as the real values corresponding to the "subjects" transformed from original operational data according to the definitions of subjects (for weather forecast, transformation based on the weather system is of the most importance). By creating a subject system from the concept set of forecaster's knowledge, defining the data transformation to change operational data into analytic data for each subject in the subject system, running the transformation program on operational data every day to get real-time analytic data and save them in a database, a data warehouse is built. Data warehouse is a data set of "analytic data". In this way, from concept to subject to analytic data, the data used in analyses directly match the concepts in forecaster's mind, and make data analysis more quickly and use more data in operational forecast.There are two important analysis tools in data warehouse. Data Mining (DM) is an exploring tool. The relationships among the subjects (i.e., concepts from forecasters) are automatically explored from the analytic data set in DM system. The resulting relationship is saved in the knowledge base of data warehouse, and reinforce forecaster's knowledge. Mining of association rules is noteworthiness because sometimes it is more reasonable than linear regression analysis. On-line analysis process (OLAP) is another analysis tool, an interactive validating tool. Forecasters use it to view data, validating relationships (including forecaster's guess and results from DM) and then make forecast decisions. The kernel technology is multi-dimensional analysis. Especially for weather forecast, "Compare analysis", "Multi-analysis" and "Analog analysis" based on multi-dimensional analysis are also used in OLAP. OLAP will be the main workbench for forecasters in data warehouse.In data warehouse, metadata is used. Data management and maintenance become easier and flexible, historical data and heterogeneous data such as special observation data, even Internet data, will be easier to use by applications. The bottleneck of traditional knowledge base system is knowledge acquirement. In data warehouse, forecasters put their concepts into subject system of data warehouse firstly, then get relationships between concepts from DM (or manually input some certain relationships) and validate them by OLAP. The knowledge of forecasters will be systematically used in forecast process, and the bottleneck problem will be moderated.