意外充分性引导的深度神经网络测试样本生成

Surprise Adequacy-Guided Deep Neural Network Test Inputs Generation

郭虹静 ¹陶传奇 ²黄志球³

扫码查看

作者信息

1. 南京航空航天大学计算机科学与技术学院南京 210016
2. 南京航空航天大学计算机科学与技术学院南京 210016;高安全系统的软件开发与验证技术工信部重点实验室(南京航空航天大学) 南京 210016;计算机软件新技术国家重点实验室(南京大学) 南京 210023
3. 南京航空航天大学计算机科学与技术学院南京 210016;高安全系统的软件开发与验证技术工信部重点实验室(南京航空航天大学) 南京 210016
折叠

摘要

由于深度神经网络(deep neural network,DNN)模型的复杂性和不确定性等属性,对模型的一般行为和边界行为进行充分的测试是保障模型质量的重要手段.当前的研究主要基于制定的覆盖准则,结合模糊测试技术生成衍生测试样本,从而提升测试充分性,但较少综合考虑测试样本的多样性及个体揭错能力.意外充分性指标量化测试样本与训练集在神经元输出方面的差异,是测试充分性评估的重要指标,目前缺乏基于此指标的测试样本生成方法.因此,提出了一种意外充分性引导的深度神经网络测试样本生成方法,首先,筛选对于决策结果贡献较大的重要神经元,以其输出值为特征,改进意外充分性指标;其次,基于测试样本的意外充分性度量筛选具有揭错能力的种子样本;最后,利用覆盖引导的模糊测试思想,将测试样本的意外充分性值和DNN模型预测的类别概率差异作为联合优化目标,利用梯度上升算法计算扰动,迭代生成测试样本.为了验证所提方法的有效性,选取 5个DNN模型作为被测对象,涵盖 4种不同的图像数据集,实验结果表明,改进的意外充分性指标能够有效捕捉异常的测试样本,同时减少计算时间开销.在测试样本生成方面,与方法DeepGini和RobOT相比,基于所提的种子样本选择策略生成的衍生测试集的意外覆盖率最高提升了 5.9个百分比和 15.9个百分比.相比于方法DLFuzz和DeepXplore,所提方法的意外覆盖率最高提升了26.5个百分比和33.7个百分比.

Abstract

Due to the complexity and uncertainty of deep neural network(DNN)models,generating test inputs to comprehensively test general and corner case behaviors of DNN models is of great significance for ensuring model quality.Current research primarily focuses on designing coverage criteria and utilizing fuzzing testing technique to generate test inputs,thereby improving test adequacy.However,few studies have taken into consideration the diversity and individual fault-revealing ability of test inputs.Surprise adequacy quantifies the neuron activation differences between a test input and the training set.It is an important metric to measure test adequacy,which has not been leveraged for test input generation.Therefore,we propose a surprise adequacy-guided test input generation approach.Firstly,the approach selects important neurons that contribute more to decision-making.Activation values of these neurons are used as features to improve the surprise adequacy metric.Then,seed test inputs are selected with error-revealing capability based on the improved surprise adequacy measurements.Finally,the approach utilizes the idea of coverage-guided fuzzing testing to jointly optimize the surprise adequacy value of test inputs and the prediction probability differences among classes.The gradient ascent algorithm is adopted to calculate the perturbation and iteratively generate test inputs.Empirical studies on 5 DNN models covering 4 different image datasets demonstrate that the improved surprise adequacy metric effectively captures surprising test inputs and reduces the time cost of the calculation.Concerning test input generation,compared with DeepGini and RobOT,the follow-up test set generated by using the proposed seed input selection strategy exhibits the highest surprise coverage improvement of 5.9%and 15.9%,respectively.Compared with DLFuzz and DeepXplore,the proposed approach achieves the highest surprise coverage improvement of 26.5%and 33.7%,respectively.

关键词

软件测试/测试样本生成/测试覆盖/深度神经网络/意外充分性

Key words

software testing/test input generation/test coverage/deep neural network/surprise adequacy

引用本文复制引用

基金项目

国家自然科学基金重点项目(U224120044)

国家自然科学基金(62202223)

江苏省自然科学基金(BK20220881)

计算机软件新技术国家重点实验室开放基金(KFKT2021B32)

中央高校基本科研业务费专项(NT2022027)

出版年

2024

计算机研究与发展

中国科学院计算技术研究所中国计算机学会

计算机研究与发展

CSTPCDCSCD北大核心

影响因子：2.649

ISSN：1000-1239

参考文献量36

段落导航