首页|基于随机森林的财务异常数据提取方法

基于随机森林的财务异常数据提取方法

扫码查看
提出了一种基于随机森林的提取方法.通过明确财务数据具有的信息增益、信息储存量、搜索引擎等多种属性,建立决策树,利用基尼系数计算决策树中每个节点数据的类别值,将同一类别数据归类到同一层次内,若得到的数据特征分布较为混乱,采用分割法将所处数据集内的全部数据实施分割,直至迭代得到最为精准的结果.在决策树中引入选择性集成算法,根据得到的特征值,将具有同种特征的数据划分到同一子范畴内,保证特征统一性,在决策树中输入异常数据特征,通过特征查找提取到异常财务数据值.仿真实验证明,所提方法对异常数据的提取精准度高,误检率低,可以最少的迭代次数实现达到较好的结果.
Research on Financial Anomaly Data Extraction Method based on Random Forest
Due to the poor data stability and many kinds of financial abnormal data extraction meth-ods,it was difficult to extract abnormal data.Therefore,proposes an extraction method based on ran-dom forest.By clarifying the information gain,information storage,search engine and other attrib-utes of financial data,a decision tree was established.The Gini coefficient was used to calculate the category value of each node data in the decision tree,and the same category data was classified into the same level.If the data feature distribution was chaotic,the segmentation method was used to segment all the data in the data set until the most accurate result was obtained through iteration.The selective integration algorithm was introduced into the decision tree.According to the obtained eigenvalues,the data with the same characteristics are divided into the same subcategory to ensure the unity of characteristics.The abnormal data characteristics are input into the decision tree,and the abnormal fi-nancial data values are extracted through feature search.Simulation results show that the proposed method has high extraction accuracy and low false detection rate,and can achieve the desired results with the least number of iterations.

random forest decision treeGini coefficientinformation storage capacitysubcategoryrecall

叶正娟

展开 >

合肥科技职业学院 经济管理系,安徽 合肥 230000

随机森林决策树 基尼系数 信息储存量 子范畴 召回率

安徽省高等学校人文社会科学研究重点项目

SK2018A1030

2024

淮阴师范学院学报(自然科学版)
淮阴师范学院

淮阴师范学院学报(自然科学版)

影响因子:0.259
ISSN:1671-6876
年,卷(期):2024.23(1)
  • 21