首页|基于深度学习方法的中文书籍专业术语提取方法研究

基于深度学习方法的中文书籍专业术语提取方法研究

扫码查看
中文缺乏单词边界,从非结构化文本中识别中文专业术语十分具有挑战性,因此专业术语识别技术的应用的情景非常多样化.设计了一种针对任意领域内中文的提取专业术语的新方法.首先获取文本数据的分词结果,然后采用基于BERT改进的词表征方法获得词向量,最后使用基于自动编码器的深度聚类方法完成对中文专业术语的提取.分别在公开数据集和自选取的专业书籍数据上做了对比实验.与其他方法相比,改进后算法在精确率、召回率和F1 值3 个指标上都有了明显的提升.
Professional Terms Extracting Method of Books in Chinese Based on Deep Learning
Chinese lacks word boundaries,and identifying Chinese professional terms from unstructured text is very challenging.Therefore,the application scenarios of professional term recognition technology are very diverse.A new method for extracting professional terms from Chinese in any field has been designed.Firstly,we obtain the segmentation results of text data,then use an improved word representation method based on BERT to obtain word vectors,and finally use a deep clustering method based on autoencoder to complete the extraction of Chinese professional terms.Comparative experiment has been conducted on publicly available datasets and data from self-selected professional books.Compared with other methods,the improved algorithm has shown significant improvements in accuracy,recall,and F1 value.

professional termsdeep learningdeep clusteringentity naming recognitionmachine learning

聂耀鑫、蒋东来、程国军

展开 >

太极计算机股份有限公司,北京 100012

专业术语 深度学习 深度聚类 实体命名识别 机器学习

2024

石家庄学院学报
石家庄学院

石家庄学院学报

影响因子:0.223
ISSN:1673-1972
年,卷(期):2024.26(3)
  • 27