首页|基于深度学习的不良应用域名早期识别方法

基于深度学习的不良应用域名早期识别方法

扫码查看
不良应用网站依赖域名系统(DNS)实现不良内容传播,严重影响互联网的健康发展。尽早识别出不良应用网站对应的域名(即不良应用域名),并进行相应治理,对域名系统的管理与运行至关重要。本文从国家顶级域名(。CN)管理的角度出发,关注如何在注册阶段识别不良应用域名。分析发现不良应用域名在注册特征与文本结构2个维度,与正常域名存在显著差异。为此,提出了一种基于深度学习的不良应用域名早期识别方法。该方法首先提取域名的注册信息特征,并利用预训练语言模型基于Transformer的双向编码器(BERT)提取域名本身的文本语义特征,其次基于注意力机制融合2类特征,并最终使用全连接神经网络,构建域名分类器,实现不良应用域名的早期识别。基于真实网络数据的实验结果表明,所提方法分类准确率(F1分数)可达到0。99;消融实验结果也验证了所选特征的有效性和必要性。
A deep learning based approach for early detection of abused domain names
The harmful websites rely on the domain name system(DNS)to achieve the dissemination of unhealthy con-tent;these websites adversely affect the Internet's development.Therefore,it is of great importance for the DNS's operation and management to identify the domain names that correspond to the harmful website(i.e.,the abused domain names)as early as possible and dispose of them accordingly.From the perspective of the country top-level domain(.CN)management,this paper focuses on the detection of the abused domain names at the registration stage.We find distinct differences between the abused domain names and normal domain names in terms of regis-tration characteristics and text structure.Based on this observation,a deep learning based approach for early detec-tion of abused domain names is proposed.Specifically,the proposed method first extracts the registration informa-tion features of the domain names,as well as the text semantic features of the domain names themselves using the pre-training bidirectional encoder representation from transformers(BERT).Next,the method leverages the atten-tion mechanism to coordinate the two types of features.Finally,a fully connected neural network is used to con-struct the domain name classification model,where the output indicates whether a given domain is an abused one or not.Extensive experiments based on real-life network data show that the Fl score of the proposed method can reach as high as 0.99.The ablation results also demonstrate the effectiveness and necessity of using the selected features to construct the classification model.

domain name system(DNS)domain name classificationdeep learningpre-training model

胡安磊、田语、陈勇、李振宇、谢高岗

展开 >

中国科学院计算技术研究所 北京 100190

中国互联网络信息中心 北京 100190

中国科学院大学 北京 100049

中国科学院计算机网络信息中心 北京 100083

展开 >

域名系统(DNS) 域名分类 深度学习 预训练语言模型

国家重点研发计划国家自然科学区域联合重点基金国家自然科学区域联合重点基金

2022YFB3103000U20A2018062072437

2024

高技术通讯
中国科学技术信息研究所

高技术通讯

CSTPCD北大核心
影响因子:0.19
ISSN:1002-0470
年,卷(期):2024.34(2)
  • 24