支持向量数据描述在西北暴雨预报中的应用试验

Support Vector Data Description in Rainstorm Prediction of the Northwest China

  • 摘要: 传统机器学习中通常隐含假设所研究问题是类别平衡的, 气象预报中预测灾害天气时就不满足这个假设, 这时往往需要预测重要而稀少的正类 (少数类)。传统机器学习以精度最大化为目标, 在遇到不平衡类别问题时, 容易训练出把所有实例都分为反类 (多数类) 的平庸的分类器。支持向量数据描述是从支持向量机 (SVM) 发展而来的基于核的机器学习方法, 只使用一类样本就可以工作, 适合于不平衡类别。以铜川暴雨预测作为试验对象, 对SVM和支持向量数据描述 (SVDD) 进行了对比试验。试验结果表明对于这个不平衡类别问题SVDD具有优势。

     

    Abstract: The expert system (ES) has been studied and applied in meteorological field widely. ES depends on know ledge engineers to enter knowledge used in inferring by computer, which is toilsome and error-prone work. As another branch of artificial intelligence (AI), machine learning aims at solving the know ledge obtaining problem automatically and paving a path to remedy the shortcoming of ES. But machine learning still does not work well if it is not tailored to fit characteristics of weather foresting, among which imbalanced class is an important problem deserving study.Although it is usually assumed implicitly by the machine learning research community that the classes are well-balanced, there exist many domains for which one class is represented by a large number of examples while the other is represented by only a few, and there are many applications demanding to classify im portant but rare positive examples (minority). It is a typical example of learning from imbalanced training set to predict such disaster weathers as hail and rainstorm in meteorology. Though they are small probability events, those disastrous weathers will bring about serious destruction. Thus disastrous weathers' prediction has been paid much more attention by meteorologist than normal weather prediction. Normally, the number of examples belonging to normal weather is much more than disaster ones. Aiming at improving the accuracy, trivial classifier that labels every example with majority when faced with imbalanced class distribution will be lead to by traditional machine learning algorithms.By doing so, high accuracy would be obtained.Imbalanced class is a stumbling block stymieing practical attempts to apply machine learning to realistic problem.In order to find algorithms being resistant to imbalanced class distribution, threat score (TS) is used as criterion to evaluate classifiers.As a kernel method, SVM fails to deal with imbalanced class problem too although based on statistical learning theory, and working well in many applications. SVM will incline to the majority class (corresponding to normal weather), and lose very important disaster weather. Support vector data description (SVDD) is another import kernel method originated from SVM. By employing training examples of target set only, one class method is fit for imbalanced class problem. As one class method, SVDD tries to obtain characteristics of target class, and is resistant to class imbalanced problem.The comparative study of SVDD and SVM is conducted to predict rainstorm in Tongchuan City, Shaanxi Province. The experiment shows that SVM is prone to majority class evidently, and brings about many false negative. When normal weather class is select as target, TS of SVDD' is prior to SVM. The result fits the theory analysis on SVDD and SVM.Results show that SVDD is a better choice than such traditional methods as SVM when dealing with imbalanced class problem, better performance could be obtained if the class with more examples is chosen as target class.

     

/

返回文章
返回