基于Fast-MCD稳健判别模型的股票评级分析
Stock Rating Analysis Based on Fast-MCD Robust Clustering Model
段昱兵 1马少娟2
作者信息
- 1. 北方民族大学数学与信息科学学院,宁夏 银川 750021
- 2. 北方民族大学数学与信息科学学院,宁夏 银川 750021;宁夏智能信息与大数据处理重点实验室,宁夏银川 750021
- 折叠
摘要
随着信息技术的高速发展,每条数据所包含的信息越来越丰富,使得数据不可避免地含有异常值,且随着数据量的增加,异常值出现的可能性更大.首先,通过模拟数据对比实验发现传统的距离判别分析对异常特别敏感,当数据中的异常值比例增加至30%时,传统距离判别方法效果较差.其次,运用Fast-MCD的思想对总体协方差阵进行稳健估计,在传统距离判别的基础上进行优化,数值模拟发现基于Fast-MCD估计的距离判别分析对异常值有显著的防御作用.最后,选取800个A股市场2021年公司股票年报数据作为训练集和测试集,分别用稳健判别和传统距离判别对测试集数据判别预测和比较,计算结果显示基于Fast-MCD稳健的距离判别方法结果更加准确.
Abstract
With the rapid development of information technology,the data contains more and more rich information,which makes the data inevitably contain contamination,and the larger the amount of data,the greater the possibility of contamination.Firstly,through simulation experiments,it is found that the traditional distance discriminant analysis is particularly sen-sitive to anomalies,and the traditional distance discriminant method is less effective when the proportion of contamination in the data increases to 30%.Secondly,the idea of Fast-MCD is applied to estimate the covariance matrix robustly and optimize it on the basis of traditional distance discrimination,and numerical simulation finds that the distance discrimination analy-sis based on Fast-MCD estimation has a significant defensive effect on contamination.Finally,800 A-share market 2021 company stock annual report data are selected as the training set and test set,respectively,the robust discrimination and traditional distance discrimination are used to discriminate the prediction and comparison of the test set data,and the results show that the results of the Fast-MCD robust distance discrimination method based on Fast-MCD are more accurate.
关键词
Fast-MCD估计/距离判别/稳健统计/统计模拟Key words
FAST-MCD estimation/distance discrimination/robust statistical method/sta-tistical simulation引用本文复制引用
基金项目
全国统计科学研究项目(2020LY046)
服务国家战略服务民族工作重大现实问题研究项目(MYJKS17)
出版年
2024