智慧电力2024,Vol.52Issue(6) :100-107.

基于增强优化预训练语言模型的电力数据实体识别方法

Entity Recognition Method for Power Data Based on Enhanced Optimization Pre-trained Language Model

田雪涵 董坤 赵剑锋 郭希瑞
智慧电力2024,Vol.52Issue(6) :100-107.

基于增强优化预训练语言模型的电力数据实体识别方法

Entity Recognition Method for Power Data Based on Enhanced Optimization Pre-trained Language Model

田雪涵 1董坤 2赵剑锋 3郭希瑞2
扫码查看

作者信息

  • 1. 东南大学软件学院,江苏苏州 215000
  • 2. 东南大学电气工程学院,江苏南京 210096
  • 3. 东南大学电气工程学院,江苏南京 210096;南京林业大学信息科学技术学院,江苏南京 210037
  • 折叠

摘要

知识图谱可有效整合电力系统中的多源数据,提升电网的知识管理水平.针对电力文本数据集稀缺、实体类型多样、专业性强的特点,提出1种基于增强优化预训练语言模型的电力数据实体识别方法.该方法使用实体词袋替换的数据增强技术扩大原始数据集,采用增强优化预训练语言模型(RoBERTa)进行动态语义编码,利用双向长短期记忆网络(BiLSTM)和条件随机场(CRF)提取特征并优化标签.实验结果表明,该实体识别方法比传统基于深度学习的实体识别方法的平均数指标F1分数高2.17%,证实其对构建电力数据知识图谱的识别效果.

Abstract

Knowledge graph can effectively integrate multi-source data in the power system,improve the level of grid knowledge management.In light of the scarcity of power datasets,diverse entity types and strong professionalism,a method for power data entity recognition based on enhanced optimization pre-trained language model is proposed.This method utilizes data augmentation techniques based on entity word bags to expand the original dataset,employs enhanced optimization pre-trained language model for dynamic semantic encoding,and utilizes bidirectional long short term memory networks and conditional random fields to extract features and optimize labels.Experimental results demonstrate that this entity recognition method outperforms traditional deep learning-based entity recognition methods by 2.17%in F1 score,its effectiveness is confirmed in constructing knowledge graphs for power data.

关键词

知识图谱/实体识别/数据增强/预训练语言模型/双向长短期记忆网络/条件随机场

Key words

knowledge graph/entity recognition/data augmentation/pre-trained language model/bidirectional long short term memory network/conditional random field

引用本文复制引用

基金项目

国家自然科学基金(52077039)

出版年

2024
智慧电力
陕西省电力公司

智慧电力

CSTPCDCSCD北大核心
影响因子:0.831
ISSN:1673-7598
被引量1
参考文献量28
段落导航相关论文