首页|词典增强与部首感知的羊病命名实体识别

词典增强与部首感知的羊病命名实体识别

扫码查看
信息技术的快速发展催生了海量且赋有潜在价值的羊病信息。但目前鲜有针对羊病文本的命名实体识别研究,且通用模型难以表征羊病语义信息,相比于其他领域,羊病的命名实体识别存在更多未登记词。基于此,提出词典增强和部首感知的羊病实体识别模型。该方法构建羊病词典,将BERT底层字向量与其匹配到词汇向量的相似权重矩阵集成,深度底层嵌入羊病词典信息,改进通用BERT模型难以表征羊病信息问题;此外基于卷积神经网络框架提取羊病实体独特的象形部首特征,使用递归式拆解字符偏旁部首,将最终提取部首特征与BERT输出特征序列拼接映射到下层BiLSTM-CRF模型输入层,提高羊病实体边界感知。通过实验证明,该模型在羊病文本的命名实体识别中具备更高适配性。
Dictionary Enhancement and Radical Perception for Sheep Disease Named Entity Recognition
The rapid development of information technology has given rise to a vast and potentially valuable amount of informa-tion on sheep diseases.However,there are few researches on named entity recognition of sheep disease texts,and the general model is difficult to represent the semantic information of sheep disease.Compared with other fields,there are more unregistered words in named entity recognition of sheep disease.Based on this,a sheep disease entity recognition model with dictionary enhancement and radical perception is proposed.This method constructs a sheep disease dictionary,integrates the similarity weight matrix of BERT's underlying word vector and its matching to the vocabulary vector,deeply embeds sheep disease dictionary information in the under-lying layer,and improves the difficulty of characterizing sheep disease information in the universal BERT model.In addition,based on the convolutional neural network framework,the unique pictographic radical features of sheep disease entities are extracted.Re-cursive disassembly of character radicals is used to concatenate and map the final extracted radical features with BERT output fea-ture sequences to the lower BiLSTM-CRF model input layer,improving the boundary awareness of sheep disease entities.Through experiments,it has been proven that this model has higher adaptability in named entity recognition of sheep disease texts.

sheep diseaseNERradical featureBiLSTM

杨朋、王天一

展开 >

贵州大学大数据与信息工程学院 贵阳 550025

羊疾病 命名实体识别 部首特征 双向长短记忆网络

贵州省科技计划项目

黔科合支撑[2021]一般176号

2024

计算机与数字工程
中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD
影响因子:0.355
ISSN:1672-9722
年,卷(期):2024.52(2)
  • 17