Research onPatent Technology Subject Clustering Based on Sentence-BERT:Taking the Field of Artificial Intelligence as an Example
[Research purpose]The Sentence-Bert model is applied to patent technology topic clustering to solve the problem of sparse se-mantic features of lexical vectors caused by the frequent use of unique technical terms in patent documents in order to highlight novelty.[Research method]The study takes 22370 patents in the field of artificial intelligence from 2015 to 2019 as experimental data.Firstly,the Sentence-Bert algorithm is used to vectorize the patent document abstract text;Secondly,the data dimension of the vectorization ma-trix is reduced,and the HDBSCAN method is used to find the high-density clusters in the original data;Finally,the topic features in the class cluster text collection are identified and the topic presentation was completed.[Research conclusion]Compared with LDA topic model,K-means,doc2vec and other methods,the experimental results of this study improves the granularity and accuracy of topic divi-sion,and obtains better topic consistency.How to use the fine tune strategy to further improve the effect of the model is the direction of further exploration of this method in the future.