支持向量数据描述在西北暴雨预报中的应用试验
Support Vector Data Description in Rainstorm Prediction of the Northwest China
-
摘要: 传统机器学习中通常隐含假设所研究问题是类别平衡的, 气象预报中预测灾害天气时就不满足这个假设, 这时往往需要预测重要而稀少的正类 (少数类)。传统机器学习以精度最大化为目标, 在遇到不平衡类别问题时, 容易训练出把所有实例都分为反类 (多数类) 的平庸的分类器。支持向量数据描述是从支持向量机 (SVM) 发展而来的基于核的机器学习方法, 只使用一类样本就可以工作, 适合于不平衡类别。以铜川暴雨预测作为试验对象, 对SVM和支持向量数据描述 (SVDD) 进行了对比试验。试验结果表明对于这个不平衡类别问题SVDD具有优势。
-
关键词:
- 机器学习;
- 支持向量数据描述 (SVDD);
- 支持向量机 (SVM);
- 暴雨预测
Abstract: The expert system (ES) has been studied and applied in meteorological field widely. ES depends on know ledge engineers to enter knowledge used in inferring by computer, which is toilsome and error-prone work. As another branch of artificial intelligence (AI), machine learning aims at solving the know ledge obtaining problem automatically and paving a path to remedy the shortcoming of ES. But machine learning still does not work well if it is not tailored to fit characteristics of weather foresting, among which imbalanced class is an important problem deserving study.Although it is usually assumed implicitly by the machine learning research community that the classes are well-balanced, there exist many domains for which one class is represented by a large number of examples while the other is represented by only a few, and there are many applications demanding to classify im portant but rare positive examples (minority). It is a typical example of learning from imbalanced training set to predict such disaster weathers as hail and rainstorm in meteorology. Though they are small probability events, those disastrous weathers will bring about serious destruction. Thus disastrous weathers' prediction has been paid much more attention by meteorologist than normal weather prediction. Normally, the number of examples belonging to normal weather is much more than disaster ones. Aiming at improving the accuracy, trivial classifier that labels every example with majority when faced with imbalanced class distribution will be lead to by traditional machine learning algorithms.By doing so, high accuracy would be obtained.Imbalanced class is a stumbling block stymieing practical attempts to apply machine learning to realistic problem.In order to find algorithms being resistant to imbalanced class distribution, threat score (TS) is used as criterion to evaluate classifiers.As a kernel method, SVM fails to deal with imbalanced class problem too although based on statistical learning theory, and working well in many applications. SVM will incline to the majority class (corresponding to normal weather), and lose very important disaster weather. Support vector data description (SVDD) is another import kernel method originated from SVM. By employing training examples of target set only, one class method is fit for imbalanced class problem. As one class method, SVDD tries to obtain characteristics of target class, and is resistant to class imbalanced problem.The comparative study of SVDD and SVM is conducted to predict rainstorm in Tongchuan City, Shaanxi Province. The experiment shows that SVM is prone to majority class evidently, and brings about many false negative. When normal weather class is select as target, TS of SVDD' is prior to SVM. The result fits the theory analysis on SVDD and SVM.Results show that SVDD is a better choice than such traditional methods as SVM when dealing with imbalanced class problem, better performance could be obtained if the class with more examples is chosen as target class. -
表 1 两种方法的预报试验结果1
Table 1 The first results of two prediction methods
表 2 两种方法的预报试验结果2
Table 2 The second results of two prediction methods
-
[1] 边肇祺, 张学工.模式识别.北京:清华大学出版社, 2000.296-320. [2] 王永庆.人工智能 (第一版).西安:西安交通大学出版社, 1994.6-8;16-19. [3] Vapnik V N. Statistical Learning Theory. John Wiley & Sons, Inc, New York, 1998. [4] Vapnik V N. The Nature of Statistical Learning Theory. Springer Verlag, New York, 2000. http://www.springer.com/us/book/9780387987804 [5] Cristianini N, Taylor J S. An Introduction to SVMs and Other Kernel-based Learning Methods. UK: Cambridge Univ Press, 2000. [6] Tax D M J, Duin R P W. Support vector domain description. Pattern Recognition Letters, 1999, 20(11-13) : 1191-1199. doi: 10.1016/S0167-8655(99)00087-2 [7] 陈永义, 俞小鼎, 高学浩, 等.处理非线性分类和回归问题的一种新方法 (Ⅰ)——支持向量机方法简介.应用气象学报, 2004, 15(3):345-353. http://qikan.camscma.cn/jams/ch/reader/view_abstract.aspx?file_no=20040344&flag=1 [8] 冯汉中, 陈永义, 成永勤, 等.双流机场低能见度天气预报方法研究.应用气象学报, 2006, (1):96-101. http://qikan.camscma.cn/jams/ch/reader/view_abstract.aspx?file_no=20060116&flag=1 [9] 赵国令, 肖科丽.支持向量机方法在天气预报中的应用.陕西气象, 2004(6):1-4. http://www.cnki.com.cn/Article/CJFDTOTAL-SXQI200406000.htm [10] 冯汉中, 陈永义.支持向量机回归方法在实时业务预报中的应用.气象, 2005, 31(1):41-44. http://www.cnki.com.cn/Article/CJFDTOTAL-QXXX200501008.htm [11] 车怀敏, 冯汉中.支持向量机方法在德阳降水分类预报中的应用试验.四川气象, 2004, 24(2):13-16. http://www.cnki.com.cn/Article/CJFDTOTAL-SCCX200402003.htm [12] Tax D, Ypma A, Duin R. Support Vector Data Description Applied to Machine Vibration Analysis//Boasson M, Kaandorp J, Tonino J, et al. Proc 5th Annual Conference of the Advanced School for Computing and Imaging (Heijen, NL, June 15-17), 1999: 398-405. http: // citeseer, ist. psu. edu/ tax99support, html. [13] Tax D M J, Duin R P W. Outlier detection using classifier instability. Lecture Notes in Computer Science, 1998, 1451: 593-601. doi: 10.1007/BFb0033222 [14] Tax D M J, Duin R P W. Data Description in Subspaces//Sanfeliu A, Villanueva J J, Vanrell M, et al. Proc 15th Int Conference on Pattern Recognition and Neural Networks (ICPR15). Los Alamitos: IEEE Computer Society Press, 2000, 2: 672-675. [15] Tax D M J. One-class Classification: Concept-learning in the Absence of Counter-examples.Delft University of Technology, 2001. https://www.researchgate.net/publication/247053706_One-Class_Classification_Concept-Learning_In_The_Absence_Of_Counter-Examples