Paper and Patent Data Fusion Based on Deep Text Clustering
[Objective]This study integrates papers and patents based on research topics to bridge their language gaps.[Method]Using Wikipedia as the primary classification system,we constructed a small number of annotation sets semi-automatically.Then,we designed a semi-supervised deep text clustering model to fuse papers and patents with similar topics.Finally,we created indicators to evaluate the data fusion quality.[Results]Our model's clustering accuracy was 2.4~11.9%higher than that of other baseline models.Its quality evaluation score of data fusion reached 0.9,which can supplement research topics based on the known topics.[Limitations]We did not conduct empirical analysis using the fused data and need to determine the cluster numbers manually.[Conclusion]The proposed model can extract topic-related features from differentiated texts of papers and patents to effectively realize data fusion.
Deep Text ClusteringData FusionPapersPatentsResearch Topic Identification