一种约束制导的机器学习框架漏洞检测方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：随着机器学习在社会各领域中自主决策场景的广泛应用,人们对机器学习框架中潜在漏洞的担忧也在日益增加.然而,由于其复杂的实现,针对框架的系统化、自动化测试成为一项艰巨的任务.现有对机器学习框架测试的研究在生成有效测试数据方面尚不成熟,导致测试数据无法通过合法性校验并因此无法检测到目标漏洞.本文提出了 ConFL,一种基于约束的机器学习框架模糊测试工具.ConFL能够自动从框架源代码中提取约束而无需任何先验知识.在约束的指导下,ConFL可以生成能够通过校验的有效输入,并执行到框架更深层次的代码逻辑.此外,本文设计了一种算子分组调度技术来提高模糊测试的效率.为了证明ConFL的有效性,本文主要在Tensor-Flow框架上评估了其性能.测试发现,与现有的SOTA工具相比,ConFL能够覆盖更多的代码行,并生成更多有效的测试数据;在相同版本的TensorFlow框架上,ConFL能检测出更多的已知漏洞.此外,ConFL在不同版本的TensorFlow中发现了 84个未知漏洞,这些漏洞全部被官方修复并被分配了 CVE编号,其中包括3个严重漏洞,13个高危漏洞.最后,本文还在PyTorch和PaddlePaddle中进行了通用性测试,迄今为止发现了 7个漏洞.

外文标题：Constraint-Guided Vulnerability Detection Techniques for Machine Learning Framework

外文摘要：The increasing integration of machine learning(ML)in various sectors for decision-making automation brings to light significant concerns regarding the vulnerabilities in ML frame-works.Such vulnerabilities pose a considerable risk,potentially undermining the integrity and reliability of ML applications in critical areas.Testing these frameworks,however,is notably challenging due to their complex implementations.The intricacy of these systems often masks vulnerabilities,making them difficult to detect with conventional methods.Historically,fuzzing ML frameworks has been met with limited success.The primary challenge in this area has been the effective extraction of input constraints and the generation of valid inputs.Traditional approa-ches often result in prolonged fuzzing periods,which are not only inefficient but also insufficient in reaching the deeper,more complex execution paths where critical vulnerabilities might lie.In response to these challenges,our paper introduces ConFL(Constraint Fuzzy Lop),a novel,con-straint-guided fuzzer designed specifically for ML frameworks.ConFL marks a significant advancement in the field of ML framework testing.Its ability to automatically extract constraints from source codes is a groundbreaking feature.This automation is particularly beneficial as it eliminates the need for prior knowledge of the framework's inner workings,thus democratizing the testing process.The constraint-guided approach of ConFL is instrumental in generating valid inputs that are more likely to pass through the initial layers of verification in ML frameworks.This capability enables ConFL to delve deeper into the operator code's pathways,thus uncove-ring vulnerabilities that would otherwise remain hidden in traditional testing methods.Moreover,ConFL innovates with a unique grouping technique designed to enhance fuzzing efficiency.This technique organizes the testing process in a more structured manner,allowing for a more thor-ough and systematic exploration of the framework's vulnerabilities.Our evaluation of ConFL's performance,primarily on the TensorFlow framework,has yielded impressive results.ConFL demonstrates a superior capability in covering more code lines and generating a greater number of valid inputs compared to state-of-the-art(SOTA)fuzzers.This increased efficiency is crucial in the practical application of fuzzing in ML frameworks,as it translates to more robust and secure ML applications.In the realm of known vulnerabilities within the TensorFlow framework,Con-FL has shown exceptional prowess.It has successfully detected a larger number of vulnerabilities than existing fuzzers.But perhaps more importantly,ConFL has identified 84 previously unknown vulnerabilities across various versions of TensorFlow.These newly discovered vulnera-bilities,which include 3 of critical severity and 13 of high severity,have been significant enough to warrant new CVE(Common Vulnerabilities and Exposures)ids.The versatility of ConFL is further demonstrated by its application to other ML frameworks such as PyTorch and Paddle.In these frameworks,ConFL has already identified 7 vulnerabilities,indicating its potential as a universal tool for ML framework testing.In conclusion,ConFL represents a significant step forward in securing ML frameworks.Its automated,constraint-guided approach not only makes the fuzzing process more efficient but also more effective in uncovering deep-seated vulnerabili-ties.As ML continues to permeate various sectors,tools like ConFL will be vital in ensuring the security and reliability of ML-driven systems.

外文关键词：

machine learning frameworkconstraints extractionoperator testingfuzzingvul-nerability detection

作者：

刘昭、邹权臣、于恬、王旋、张德岳、孟国柱、陈恺

展开 >

作者单位：

北京奇虎科技有限公司AI安全实验室北京 100015

中国科学院信息工程研究所北京 100195

关键词：

机器学习框架约束提取算子测试模糊测试漏洞检测

基金：

国家科技创新"新一代人工智能"重大项目(2030)

项目编号：

2020AAA0104300

出版年：

2024

DOI：

10.11897/SP.J.1016.2024.01120

计算机学报

中国计算机学会中国科学院计算技术研究所

计算机学报

CSTPCD北大核心

影响因子：3.18

ISSN：0254-4164

年,卷(期)：2024.47(5)