Intelligent Patent Classification Technology Research Based on SimBERT+CNN Model
This paper presents a SimBERT+CNN deep learning model for intelligent patent classification in the tobacco industry,using tobacco-related technology patents as examples.The main research method is as follows:Tobacco-related patents are manually annotated with two-level technology classifications,including tobacco technology class and non-tobacco technology class patents,to serve as sample data for deep learning.For patents with X-type citations,claim items and the corresponding text paragraphs of the cited patents are extracted as sentence pairs to optimize the semantic model training based on SimBERT.The optimized SimBERT model is used to generate textual feature vectors and IPC classification number feature vectors for the patent classification samples in the tobacco industry.These features are concatenated and fed into a CNN model.Through empirical training and testing on over 150,000 tobacco technology patents and 20,000 non-tobacco technology patents,it is found that the SimBERT+CNN model optimized by the above methods achieves higher accuracy in both first-level tobacco technology classification and second-level technology classification compared to using BERT+CNN.