In order to solve the problems of polysemous words and insufficient feature extraction of long sequence information in traditional spam classification methods, a spam classification method based on BERT-SELFATT-CNN model was proposed. The dynamic text representation method BERT is used to pre-train the email content and generate word vectors with upper and lower semantic information. The similarity between words is calculated by self-attention mechanism layer which can be used for parallel computation to mine sentence long-distance information. The generated hidden layer vector is input into CNN network to extract local features of the vector. The experimental results show that the accuracy, recall rate and F1 value of the model are improved, and the training speed of the model is also improved.