Professional Terms Extracting Method of Books in Chinese Based on Deep Learning
Chinese lacks word boundaries,and identifying Chinese professional terms from unstructured text is very challenging.Therefore,the application scenarios of professional term recognition technology are very diverse.A new method for extracting professional terms from Chinese in any field has been designed.Firstly,we obtain the segmentation results of text data,then use an improved word representation method based on BERT to obtain word vectors,and finally use a deep clustering method based on autoencoder to complete the extraction of Chinese professional terms.Comparative experiment has been conducted on publicly available datasets and data from self-selected professional books.Compared with other methods,the improved algorithm has shown significant improvements in accuracy,recall,and F1 value.
professional termsdeep learningdeep clusteringentity naming recognitionmachine learning