Evaluating Spectrum-Based Fault Localization on Deep Learning Libraries

扫码查看

原文链接

NETL
NSTL
IEEE

外文摘要：Deep learning (DL) libraries have become increasingly popular and their quality assurance is also gaining significant attention. Although many fault detection techniques have been proposed, effective fault localization techniques tailored to DL libraries are scarce. Due to the unique characteristics of DL libraries (e.g., complicated code architecture supporting DL model training and inference with extensive multidimensional tensor calculations), the effectiveness of existing fault localization techniques for traditional software is also unknown on DL library faults. To bridge this gap, we conducted the first empirical study to investigate the effectiveness of fault localization on DL libraries. Specifically, we evaluated spectrum-based fault localization (SBFL) due to its high generalizability and affordable overhead on such complicated libraries. Based on the key aspects in SBFL, our study investigated the effectiveness of SBFL with different sources of passing test cases (including human-written, fuzzer-generated, and mutation-based test cases) and various suspicious value calculation methods. In particular, mutation-based test cases are produced by our designed rule-based mutation technique and LLM-based mutation technique tailored to DL library faults. To enable our extensive study, we built the first benchmark (Defects4DLL), which contains 120 real-world faults in PyTorch and TensorFlow with easy-to-use experimental environments. Our study delivered a series of useful findings. For example, the rule-based approach is effective in localizing crash faults in DL libraries, successfully localizing 44.44% of crash faults within Top-10 functions and 74.07% of crash faults within Top-10 files, while the passing test cases from DL library fuzzers perform poorly on this task. Furthermore, based on our findings on the complementarity of different sources, we designed a hybrid technique by effectively integrating human-written, LLM-mutated, rule-based mutated test cases, which further achieves 31.48%$\boldsymbol{\sim}$61.36% improvements over each single source in terms of the number of detected faults within Top-5 files.

外文关键词：

LibrariesLocation awarenessBenchmark testingComputer crashesDeep learningTrainingTensorsCodesFault locationRuntime environment

作者：

Ming Yan、Junjie Chen、Tianjie Jiang、Jiajun Jiang、Zan Wang

展开 >

作者单位：

College of Intelligence and Computing, Tianjin University, Tianjin, China

School of New Media and Communication, Tianjin University, Tianjin, China

出版年：

2025

DOI：

10.1109/TSE.2025.3552622

IEEE transactions on software engineering

ISSN：

年,卷(期)：2025.51(5)

参考文献量62