Multi-label Patent Classification Based on Text and Historical Data
Patent classification,which is used to assign multiple international patent classification(IPC)codes to a given paten,is a very important task int the field of patent data mining.In recent years,many studies on this task focus on mining patent text to predict the first or second level codes for IPC.In real scenarios,a patent often has multiple IPC codes which is a multi-label classi-fication task.Apart from the texts,each patent has a corresponding assignee and the assignee's historical patent application be-havior may have a certain business tendency.The preference representation of this behavior can effectively improve the precision of patent classification.However,previous methods fail to make full use of patent historical data.A classification model is pro-posed for patent automatic classification.Main processing of this model is as follows:firstly,initialize the patent text representa-tion with BERT pretraining language model,then use Text-CNN model to capture local features and take the output as the final patent text representation;secondly,Bi-LSTM is used to learn the preference representation by aggregating historical patent texts and labels through dual channels;finally,we fuse the texts and assignee's sequential preferences for prediction.Experiments on real data set and comparisons with different baselines show that the proposed patent classification algorithm based on patent text and historical data has a great improvement in precision.
Deep learningAutomatic classification of multi-label patentIPC codesPatent