首页|基于增强优化预训练语言模型的电力数据实体识别方法

基于增强优化预训练语言模型的电力数据实体识别方法

扫码查看
知识图谱可有效整合电力系统中的多源数据,提升电网的知识管理水平.针对电力文本数据集稀缺、实体类型多样、专业性强的特点,提出1种基于增强优化预训练语言模型的电力数据实体识别方法.该方法使用实体词袋替换的数据增强技术扩大原始数据集,采用增强优化预训练语言模型(RoBERTa)进行动态语义编码,利用双向长短期记忆网络(BiLSTM)和条件随机场(CRF)提取特征并优化标签.实验结果表明,该实体识别方法比传统基于深度学习的实体识别方法的平均数指标F1分数高2.17%,证实其对构建电力数据知识图谱的识别效果.
Entity Recognition Method for Power Data Based on Enhanced Optimization Pre-trained Language Model
Knowledge graph can effectively integrate multi-source data in the power system,improve the level of grid knowledge management.In light of the scarcity of power datasets,diverse entity types and strong professionalism,a method for power data entity recognition based on enhanced optimization pre-trained language model is proposed.This method utilizes data augmentation techniques based on entity word bags to expand the original dataset,employs enhanced optimization pre-trained language model for dynamic semantic encoding,and utilizes bidirectional long short term memory networks and conditional random fields to extract features and optimize labels.Experimental results demonstrate that this entity recognition method outperforms traditional deep learning-based entity recognition methods by 2.17%in F1 score,its effectiveness is confirmed in constructing knowledge graphs for power data.

knowledge graphentity recognitiondata augmentationpre-trained language modelbidirectional long short term memory networkconditional random field

田雪涵、董坤、赵剑锋、郭希瑞

展开 >

东南大学软件学院,江苏苏州 215000

东南大学电气工程学院,江苏南京 210096

南京林业大学信息科学技术学院,江苏南京 210037

知识图谱 实体识别 数据增强 预训练语言模型 双向长短期记忆网络 条件随机场

国家自然科学基金

52077039

2024

智慧电力
陕西省电力公司

智慧电力

CSTPCD北大核心
影响因子:0.831
ISSN:1673-7598
年,卷(期):2024.52(6)
  • 28