Research on detection and classification of fraudulent websites based on multi-dimensional features
With the development and widespread use of the Internet,the tactics of fraudulent groups and their anti-detection technologies have been significantly advanced.Consequently,the detection and classification of fraudulent websites have become increasingly significant for maintaining cybersecurity in cyberspace.Tra-ditional detection methods,however,are proving insufficient in dealing with the emerging forms of deceptive websites and there is a notable dearth of research focused on the classification of these deceptive sites.To ad-dress this issue,this paper analyzes the typical features of current new fraudulent websites and proposes a multi-dimensional feature-based system for detecting and classifying fraudulent websites,which incorporates a total of 11 types of fraudulent website features and 3600 web keywords to represent fraudulent websites.The system initially uses a crawler to obtain the screenshot of a web page,WHOIS information and source code of a domain to be detected and then delivers them to the feature extraction module to construct a multi-dimensional feature set.The detection module extracts website domain names,code structure and WHOIS information as features and constructs a random forest model to perform the detection task.Subsequently,based on the detection results,the webpage classification module utilizes bi-directional GRU to obtain the tex-tual features of the webpage.In cases where the confidence level is below 0.7,the module employs a BERT model to ensure accuracy and efficiency.Additionally,a residual neural network is used to extract the web-page screenshot features while simultaneously calculating the similarity between the internal pictures of the webpage and the website Logo,and a Random Forest model is used for classification.Comparison experi-ments were conducted to evaluate the accuracy of the method.The experimental results demonstrate that our method achieves the highest accuracy with an average F1-score of 97.28%.Moreover,the results show that the multidimensional feature model effectively distinguishes between fraudulent and legitimate websites,over-comes limitations of traditional methods in detecting new fraudulent websites,and is suitable for the rapid de-tection and classification of fraudulent websites with new domain names on a global scale.