首页|面向深度行情因子挖掘的分布式训练关键技术研究

面向深度行情因子挖掘的分布式训练关键技术研究

扫码查看
深度行情数据是沪深交易所的新一代实时行情数据产品,是普通基础行情数据的升级版,是目前国内信息密度最高、蕴含信息量最大、挖掘最不充分的行情数据,对挖掘证券市场潜在风险具有重要价值.但是,现有研究缺少基于深度行情数据面向证券市场的风险度量和计算分析,且全市场深度行情数据规模大,用于提取信息的深度学习模型也越来越复杂,尽管当下硬件的计算能力也在一直不断地发展与提高,但仍然无法解决训练耗时长、效率低等问题.因此,基于沪深300成分股深度行情数据,利用深度学习等方法挖掘高频波动率因子,构建了基于TabNet与LightGBM的高频波动率预测模型.同时,提出了一种基于并行差分进化的分布式训练算法Parallel_DE,用于模型分布式训练过程中的参数计算,并详细阐述了其场景映射方案与整体流程设计.针对上述2项工作基于自有分布式训练平台进行充分验证,实验结果表明,高频波动率预测模型可以对已实现波动率进行高精度预测,且效果相较于其他方法具有一定优越性;Parallel_DE算法可以在一定程度保留参数多样性的同时,有效减少本地参数在测试集上的误差,从而高效率分布式地训练出性能优良的深度学习模型,为证券市场的风险识别提供了面向深度行情数据的相关技术与方法.
Research on key technologies of distributed training for Level2 market quotation factor mining
Level2 market quotation data is the new generation of real-time market data products from the Shanghai and Shenzhen Stock Exchanges.Serving as an enhanced version of basic market data,it currently has the highest information density,the greatest amount of information,and the most insuffi-cient mining in China.The data is of significant value in identifying potential risks in the securities mar-ket,but existing research lacks risk measurement and analysis based on it.Moreover,the scale of Lev-e12 market quotation data in the entire market is large,and the deep learning models used to extract in-formation are becoming increasingly complex.Although hardware computing power is constantly devel-oping and improving,it still cannot solve problems such as long training time and low efficiency.There-fore,based on Level2 market quotation data of CSI 300,deep learning and other methods are used to mine high-frequency volatility factors,and builds a high-frequency volatility prediction model based on TabNet and LightGBM.At the same time,a distributed training algorithm Parallel_DE based on paral-lel differential evolution is proposed for parameter calculation in the process of model distributed train-ing,its scene mapping scheme and overall process design are elaborated.The above two work are fully verified based on the proposed distributed training platform.The experimental results show that the high-frequency volatility prediction model can predict the realized volatility with high precision,and the effect has certain advantages compared with other methods;the Parallel_DE algorithm can effectively reduce the error of local parameters on the test set while retaining the diversity of parameters to a certain extent,so as to efficiently and distributedly train a deep learning model with excellent performance.This paper provides valuable technologies and methodologies for leveraging Leve12 market quotation data in risk identification within the securities market.

Leve12 market quotationrealized volatilitydistributed trainingdifferential evolution

赵鑫博、陆忠华

展开 >

中国刑事警察学院公安信息技术与情报学院,辽宁沈阳 110854

中国科学院计算机网络信息中心,北京 100083

中国科学院大学,北京 100049

深度行情 已实现波动率 分布式训练 差分进化

2024

计算机工程与科学
国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心
影响因子:0.787
ISSN:1007-130X
年,卷(期):2024.46(9)