A deep learning based approach for early detection of abused domain names
The harmful websites rely on the domain name system(DNS)to achieve the dissemination of unhealthy con-tent;these websites adversely affect the Internet's development.Therefore,it is of great importance for the DNS's operation and management to identify the domain names that correspond to the harmful website(i.e.,the abused domain names)as early as possible and dispose of them accordingly.From the perspective of the country top-level domain(.CN)management,this paper focuses on the detection of the abused domain names at the registration stage.We find distinct differences between the abused domain names and normal domain names in terms of regis-tration characteristics and text structure.Based on this observation,a deep learning based approach for early detec-tion of abused domain names is proposed.Specifically,the proposed method first extracts the registration informa-tion features of the domain names,as well as the text semantic features of the domain names themselves using the pre-training bidirectional encoder representation from transformers(BERT).Next,the method leverages the atten-tion mechanism to coordinate the two types of features.Finally,a fully connected neural network is used to con-struct the domain name classification model,where the output indicates whether a given domain is an abused one or not.Extensive experiments based on real-life network data show that the Fl score of the proposed method can reach as high as 0.99.The ablation results also demonstrate the effectiveness and necessity of using the selected features to construct the classification model.
domain name system(DNS)domain name classificationdeep learningpre-training model