DeepGenFuzz:An Efficient PDF Application Fuzzing Test Case Generation Framework Based on Deep Learning
PDF file is a widely used and important document format.Due to the complexity of PDF files,defects in PDF-related applications can lead to serious consequences such as malicious attacks and incorrect information rendering.Therefore,testing PDF-related applications has become a hot research topic.The most effective method currently is grammar-based fuzz testing,but it often requires a significant amount of manual work to summarize and write complex grammar rules,which seriously hinders the efficient automation of test case generation.Deep learning techniques provide a feasible solution to this challenge.However,the quality of test cases generated by current methods is generally low,and the ability to find bugs is poor.To further improve this,three main challenges need to be addressed:data set filtering,balancing test case coverage improvement and test case size in-crease,and efficient mutation of test cases.Therefore,this paper proposes a deep learning-based efficient PDF application fuzz test case generation framework called DeepGenFuzz.It utilizes models such as CNN,Seq2Seq,and Transformer to generate high-quali-ty PDF test cases through steps including data filtering,object generation,object appending,and efficient mutation.Evaluations on PDF applications like MuPDF show that DeepGenFuzz generates test cases with significantly higher average code coverage com-pared to state-of-the-art tools like Learn&Fuzz and IUST-DeepFuzz,reaching up to 8.12%~61.03%.Its bug-finding capabili-ties are also far superior to those of Learn&Fuzz and IUST-DeepFuzz.Currently,31 previously unreported bugs have been dis-covered in the seven most popular PDF applications,among which 25 have been confirmed or fixed,covering all tested programs.
PDF applicationDeep learningFuzz testingTest casesCode coverage