基于序列的程序语义规则挖掘与违规检测方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：在软件开发中,违反语义规则的源码可以正常地编译或运行,但却存在性能或功能上的缺陷.因此,如何准确地检测此类缺陷成为了一项挑战.已有的研究通常采用基于项集的规则挖掘与检测方法,但由于未能良好地结合源码的顺序信息与控制流信息,此类方法在检测能力以及准确率上都存在较大的提升空间.针对该问题,提出了一种基于序列的程序语义规则提取与违规检测方法SPUME.该方法将程序源码转化为中间表示序列,使用序列规则挖掘算法从中提取语义规则,并基于语义规则对源码中的缺陷进行检测.为验证SPUME的有效性,文中将其与3种基线方法进行了对比,包括PR-Miner,Tikanga以及Bugram.实验结果表明,相较于基于无序项集进行规则挖掘的PR-Miner,以及结合了图模型的Tikanga,SPUME在检测效果、检测速度以及准确率上都有显著提升.相比基于Ngram语言模型的Bugram方法,SPUME在准确率与其相当的情况下,高效地检测出了更多程序缺陷.

外文标题：Sequence-based Program Semantic Rule Mining and Violation Detection

外文摘要：In software development,source code that violates semantic rules may compile or run normally but may have defects in performance or functionality.Therefore,accurately detecting such defects has become a challenge.Existing research usually adopts itemset-based rule mining and detection methods,but these methods have significant room for improvement in detection ability and accuracy due to the failure to integrate the order information and control flow information of source code effectively.To address this problem,this paper propose a sequence-based method called SPUME for extracting and detecting program seman-tic rules.The method converts program source code into an intermediate representation sequence,extracts semantic rules from it using sequence rule mining algorithms,and detects defects in the source code based on these rules.To verify the effectiveness of SPUME,it is compared with three baseline methods,including PR-Miner,Tikanga,and Bugram.Experimental results show that compared with PR-Miner,which is based on unordered itemset mining,and Tikanga,which combines graph models,SPUME has significantly improved detection performance,speed,and accuracy.Compared with Bugram,which is based on Ngram language models,SPUME detects more program defects more efficiently while maintaining a similar level of accuracy.

外文关键词：

Semantic rule miningOverlapping clusteringDefect detection

作者：

李孜、周宇

展开 >

作者单位：

南京航空航天大学计算机科学与技术学院南京 210016

高安全系统的软件开发与验证技术工信部重点实验室南京 211100

关键词：

语义规则挖掘重叠聚类缺陷检测

基金：

国家自然科学基金国防基础科研项目江苏省自然科学基金

项目编号：

61972197JCKY2022605C006BK20201292

出版年：

2024

DOI：

10.11896/jsjkx.230300224

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(6)

参考文献量23