Research on Vulnerability Information Retrieval System Based on Pre-Training Model
[Research purpose]Vulnerability information in threat intelligence refers to information about the presence of vulnerabilities in networks,systems,applications,or supply chains.Current search engines have shortcomings in vulnerability information retrieval sys-tems.Using the pre-training model to build a vulnerability retrieval system can improve the retrieval efficiency.[Research method]U-sing public vulnerability information as the data source,we construct a question answering dataset and incrementally pre-train Tiny Bert.The model is used to vectorize each query,the vulnerability information is built into a faiss vector database,and the HNSW index is used for multi-channel and single-channel recall retrieval.Then,the two-tower and single-tower models are generated by comparative learning fine-tun-ing,and a simple knowledge retrieval system is constructed by using two-tower recall and single-tower fine ranking.[Research conclusion]The experimental results show that the retrieval performance can be significantly improved by using the pre-trained model,and the TOP 1 re-call rate of the two-tower model fine-tuned by comparative learning is 92.17%in the constructed vulnerability information test set.Through the retrieval practice in the field of vulnerability information,it provides some reference for building the retrieval system of threat intelligence.