首页|基于多维特征的涉诈网站检测与分类技术研究

基于多维特征的涉诈网站检测与分类技术研究

扫码查看
随着互联网的发展与普及,涉诈团伙诈骗手法与反检测技术愈发先进,涉诈网站的检测与分类对于网络空间安全重要性更加显著,而传统的检测技术已无法应对现在的新型诈骗网站,并且针对涉诈网站分类的研究很少。针对此热点难题,本文分析了当今新型涉诈网站的多个典型特征并提出了一种基于多维特征的涉诈网站检测与分类系统。该系统共构建11种涉诈网站特征与3600个网页关键词来表示一个涉诈网站。系统首先利用爬虫获取待检测域名的网页截图、WHOIS信息与源码并交给特征抽取模块构建多维特征集。检测模块提取网站域名、代码结构以及网站WHOIS信息作为特征,构建随机森林模型实现检测任务。然后基于检测结果,网页分类模块利用双向GRU提取网页的文本特征,在置信度小于0。7的情况下使用BERT模型从而保证系统准确度与效率,并使用残差神经网络提取网页截图特征,同时计算网页内部图片与网站Logo相似度,创建随机森林模型进行分类,并设计了对比实验进一步分析模型的准确性。实验证明,本文提出的模型拥有很高的准确性,模型平均F1-score达到97。28%。实验结果表明,本文提出的多维特征模型能很好地区分涉诈网站与正常网站,克服了传统方法应对新型涉诈网站的识别问题,并适用于全球新增域名的涉诈网站快速检测与分类。
Research on detection and classification of fraudulent websites based on multi-dimensional features
With the development and widespread use of the Internet,the tactics of fraudulent groups and their anti-detection technologies have been significantly advanced.Consequently,the detection and classification of fraudulent websites have become increasingly significant for maintaining cybersecurity in cyberspace.Tra-ditional detection methods,however,are proving insufficient in dealing with the emerging forms of deceptive websites and there is a notable dearth of research focused on the classification of these deceptive sites.To ad-dress this issue,this paper analyzes the typical features of current new fraudulent websites and proposes a multi-dimensional feature-based system for detecting and classifying fraudulent websites,which incorporates a total of 11 types of fraudulent website features and 3600 web keywords to represent fraudulent websites.The system initially uses a crawler to obtain the screenshot of a web page,WHOIS information and source code of a domain to be detected and then delivers them to the feature extraction module to construct a multi-dimensional feature set.The detection module extracts website domain names,code structure and WHOIS information as features and constructs a random forest model to perform the detection task.Subsequently,based on the detection results,the webpage classification module utilizes bi-directional GRU to obtain the tex-tual features of the webpage.In cases where the confidence level is below 0.7,the module employs a BERT model to ensure accuracy and efficiency.Additionally,a residual neural network is used to extract the web-page screenshot features while simultaneously calculating the similarity between the internal pictures of the webpage and the website Logo,and a Random Forest model is used for classification.Comparison experi-ments were conducted to evaluate the accuracy of the method.The experimental results demonstrate that our method achieves the highest accuracy with an average F1-score of 97.28%.Moreover,the results show that the multidimensional feature model effectively distinguishes between fraudulent and legitimate websites,over-comes limitations of traditional methods in detecting new fraudulent websites,and is suitable for the rapid de-tection and classification of fraudulent websites with new domain names on a global scale.

Fraudulent website detectionWebsite classificationRandom forestDeep learning

游畅、黄诚、田璇、燕玮、冷涛

展开 >

四川大学网络空间安全学院,成都 610065

中国电子信息产业集团有限公司第六研究所,北京 102209

四川警察学院智能警务四川省重点实验室,泸州 646099

中国科学院信息工程研究所,北京 100864

展开 >

涉诈网站检测 网站分类 随机森林 深度学习

智能警务四川省重点实验室开放课题四川省科技厅应用基础项目

ZNJW2024KFZD0032022NSFSC0752

2024

四川大学学报(自然科学版)
四川大学

四川大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.358
ISSN:0490-6756
年,卷(期):2024.61(4)
  • 3