基于注意力机制与特征融合的网络钓鱼检测算法

Phishing detection algorithm based on attention and feature fusion

扫码查看

原文链接

维普
万方数据

中文摘要：网络钓鱼是攻击者实施网络诈骗犯罪的主要手段.随着国家反网络诈骗力度的不断加大,各类网络钓鱼活动的技术对抗性也日益增加,给网络钓鱼检测工作带来了较大压力.比如,网络钓鱼攻击通常会利用图像来代替文本,并对高权重的网站标志图像加以小尺度偏移或旋转等手段,来逃避传统的基于文本或图像特征的检测算法.针对网络钓鱼技术对抗性日益增加的问题,提出一种基于注意力机制与特征融合的网络钓鱼检测算法,建立包含两个阶段的融合域名、网页结构、网页文本和网页图标等特征的层级分类模型,可有效针对攻击者的各种技术对抗性策略.该算法在第一阶段利用机器学习模型的轻量特性,融合域名、文本、网页结构特征从海量域名中预召回可疑域名子集,在第二阶段基于候选子集,引入注意力机制加深样本与被仿冒对象之间的全局文本关联特征的提取,并增加样本与被仿冒对象图标间的对比特征,建立融合文本和图像特征的深度分类模型.并在实验环节对该算法的有效性进行了验证.这种分层检测方式有效避免了对海量待检测域名的图像数据抽取,在保证检测精度的前提下大大提升了检测效率.

外文摘要：Phishing has been the primary means utilized by attackers to conduct cyber fraud.As national anti-cyber fraud efforts continue to increase,the technical confrontation of various phishing activities has also escalated,bring-ing significant pressure to phishing detection work.For instance,current phishing attacks often employ images in place of text and apply small-scale shifts or rotations to high-weight website logo images to evade traditional detec-tion algorithms that rely on text or image features.To address the problem of escalating adversarial phishing technolo-gies,a phishing detection algorithm based on the attention mechanism and feature fusion was proposed,and a hierar-chical classification model was established.This model included two stages of fusion involving domain names,web structure,web text,and web icons,capable of effectively countering various technical adversarial strategies em-ployed by attackers.In the first stage,the algorithm leveraged the lightweight characteristics of the machine learning model to pre-recall a subset of suspicious domain names from a multitude of domain names.This was achieved by fusing the structural features of domain names,text,and web pages.In the second stage,based on the candidate sub-set,the attention mechanism was introduced to enhance the extraction of global text association features between the samples and the counterfeited objects.Additionally,the contrast features between the samples and the icons of the counterfeited objects were intensified,and a deep classification model fusing text and image features was established.The effectiveness of the algorithm was ultimately verified.This hierarchical detection method effectively avoids the extraction of image data from a large number of domain names to be detected,significantly improving detection effi-ciency while ensuring the accuracy of detection.

外文关键词：

phishing detectionhierarchical feature fusionattention mechanism

作者：

张思睿、延志伟、董科军、尉迟学彪

展开 >

作者单位：

中国互联网络信息中心,北京 100190

关键词：

网络钓鱼检测层次特征融合注意力机制

基金：

国家重点研发计划

项目编号：

2022YFB3105000

出版年：

2024

DOI：

10.11959/j.issn.2096-109x.2024058

网络与信息安全学报

人民邮电出版社

网络与信息安全学报

CSTPCD

ISSN：2096-109X

年,卷(期)：2024.10(4)