Research on the Classification Method of Black Grey Production Web Pages Based on BERT
The paper proposes a website classification algorithm using the BERT model to identify specific websites.This algorithm utilizes BERT to extract feature vectors of web page text sentences,and adopts a self attention layer to solve the problem of computer configuration requirements.The sentence vectors are classified using a kernel function SVM classifier,and Focal loss is used to handle data imbalance.The experimental results show that this method is significantly superior to traditional machine learning algorithms and independent BERT models in terms of classification accuracy.
web page classificationBERTdata imbalancedeep learning