西南民族大学学报(自然科学版)2024,Vol.50Issue(4) :418-427.DOI:10.11920/xnmdzk.2024.04.009

基于TMFG生成拓扑图的软件缺陷预测图特征选择方法

Graph feature selection method of software defect prediction based on TMFG-generated topology diagrams

崔梦天 陈建英 徐智慧
西南民族大学学报(自然科学版)2024,Vol.50Issue(4) :418-427.DOI:10.11920/xnmdzk.2024.04.009

基于TMFG生成拓扑图的软件缺陷预测图特征选择方法

Graph feature selection method of software defect prediction based on TMFG-generated topology diagrams

崔梦天 1陈建英 1徐智慧1
扫码查看

作者信息

  • 1. 西南民族大学计算机科学与工程学院,四川成都 610041
  • 折叠

摘要

软件缺陷预测是降低软件测试成本的重要手段,而特征选择则是其中关键的一环.然而,传统的特征选择算法局限于考虑特征之间的双边关系和两两特征的关联,而无法有效处理更为复杂的多边关系和多向交互等问题.为此,提出了一种基于TMFG的软件缺陷预测图特征选择方法.该方法首先将拓扑图引入特征选择算法中,利用对称不确定性作为特征关联度,将特征表示为拓扑图的节点,构建特征全连接图.然后,通过TMFG去连边算法去除全连接图中的部分连边,并进行图聚类操作.接着,对每个聚类中的特征进行排序,并从每个类中选取特定数目的特征进行综合,得到最终的特征子集.最后,通过在Promise数据仓库中的数据集上进行对比实验,结果表明,所提出的方法在进一步优化特征选择选出的特征子集的质量方面取得了良好的效果,尤其在数据量较大的数据集中表现出更大的优势.

Abstract

Software defect prediction serves as an important approach to reduce software testing costs,with feature selection be-ing a crucial component.However,traditional feature selection algorithms are limited to considering bilateral relationships be-tween features and pairwise correlations,thereby being unable to effectively handle more complex multilateral relationships and multidirectional interactions.To address this issue,this paper proposed a novel software defect prediction graph-based feature se-lection method utilizing TMFG(Triangulated Maximally Filtered Graph).The method first introduced a topological graph into the feature selection algorithm,representing features as nodes in the graph and employing symmetric uncertainty as the measure of feature relevance,thus constructing a fully connected feature graph.Subsequently,the TMFG edge removal algorithm was em-ployed to remove selected edges from the fully connected graph,followed by graph clustering operations.Then,features within each cluster were ranked,and a specific number of features from each cluster were comprehensively selected to obtain the final feature subset.Finally,comparative experiments conducted on the dataset from the promise repository demonstrated that the pro-posed method achieved favorable results in further optimizing the quality of the selected feature subset,particularly exhibiting greater advantages in datasets with larger volumes.

关键词

软件缺陷预测/特征选择/拓扑图/社区检测算法/TMFG

Key words

software defect prediction/feature selection/topological graph/community detection algorithm/TMFG(Triangulated Maximally Filtered Graph)

引用本文复制引用

基金项目

四川省科技计划项目(2023YFH0057)

四川省科技计划项目(2023YFN0026)

出版年

2024
西南民族大学学报(自然科学版)
西南民族大学

西南民族大学学报(自然科学版)

CSTPCD
影响因子:0.441
ISSN:2095-4271
段落导航相关论文