首页|面向软件开发信息库的多源异构数据深层次挖掘方法

面向软件开发信息库的多源异构数据深层次挖掘方法

扫码查看
由于软件开发过程中涉及多个团队和人员的协作,文档之间往往存在不一致性、错误或遗漏等问题,这些问题如果不及时发现和处理,将严重影响软件开发的效率和质量.对此,为精准获取所需数据,提升软件开发者工作效率和软件开发速度,提出面向软件开发信息库的多源异构数据深层次挖掘方法.基于时间序列完成不同来源获取软件信息库多源异构数据缺失值以及噪声数据的处理;提取处理后多源异构数据特征,以此为输入SOM 神经网络进行多源异构数据聚类;利用ATPRK方法预测出软件信息库的多源异构数据需求,以此为依据,再次聚类SOM 网络输出聚类结果,实现多源异构数据的深层次挖掘.实验结果表示:该方法可挖掘出99%的软件开发信息库的多源异构数据;有效去除软件开发信息库中不被需要的多源异构数据;多源异构数据聚类数量为16 时的聚类正确率最好,且多源异构数据最小聚类熵值仅为 0.31,数据深层次挖掘效果较好.
Deep-level Mining Methods for Multi-source Heterogeneous Data in Software Development Information Repository
Due to the fact that software development process involves the collaboration of multiple teams and individuals in the software development process,there are some problems such as inconsisten-cies,errors or omissions among documents.If these issues are not discovered and addressed promptly,they can significantly impact the efficiency and quality of software development.To address this,a deep-level mining method for multi-source heterogeneous data in software development information repository is proposed to accurately obtain the necessary data,improve software developers'work efficiency and soft-ware development speed.The method involves handling missing values and noise data in the multi-source heterogeneous data from different sources based on time series analysis.The processed multi-source het-erogeneous data features are extracted and used as inputs for clustering using a self-organizing feature mapping neural network(SOM neural network).Additionally,the ATPRK method is utilized to predict the requirements for the multi-source heterogeneous data in the software information repository.Based on this prediction,the SOM network clusters are recalculated to obtain the clustering results,achieving deep-level mining of multi-source heterogeneous data.Experimental results indicate that this method can mine 99%of the multi-source heterogeneous data in the software development information reposito-ry,effectively remove unnecessary data,achieve the best clustering accuracy when the number of clusters is 16,with a minimum clustering entropy value of only 0.31,demonstrating good performance in deep-level mining of data.

software developmentmultisource isomerismdata miningdata preprocessingfeature extractiondata clustering

于平

展开 >

广州华南商贸职业学院 广东 广州:510000

软件开发 多源异构 数据挖掘 数据预处理 特征提取 数据聚类 SOM神经网络

2024

武汉工程职业技术学院学报
武汉工程职业技术学院

武汉工程职业技术学院学报

影响因子:0.311
ISSN:1671-3524
年,卷(期):2024.36(1)
  • 13