首页|基于联合编码的煤矿综采设备知识图谱构建

基于联合编码的煤矿综采设备知识图谱构建

扫码查看
利用知识图谱技术进行数据管理可实现对煤矿综采设备的有效表示,以便获取具有深度挖掘价值的信息.煤矿综采设备数据不均衡、某些类别设备实体较少等问题影响实体识别精度.针对上述问题,提出了一种基于联合编码的煤矿综采设备知识图谱构建方法.首先构建综采设备本体模型,确定概念及关系.然后设计实体识别模型:利用Token Embedding、Position Embedding、Sentence Embedding和Task Embedding 4层Embedding结构与Transformer-Encoder进行煤矿综采设备数据编码,提取词语间的依赖关系及上下文信息特征;引入中文汉字字库,利用Word2vec模型进行编码,提取字形间的语义规则,解决煤矿综采设备数据中生僻字问题;使用GRU模型对综采设备数据和字库编码后的字符向量进行联合编码,融合向量特征;利用Lattice-LSTM模型进行字符解码,获取实体识别结果.最后利用图数据库技术,将抽取的知识以图谱的形式进行存储和组织,完成知识图谱构建.在煤矿综采设备数据集上进行实验验证,结果表明该方法对综采设备实体的识别准确率较现有方法提高了 1.26%以上,在一定程度上缓解了在少量样本情况下构建煤矿综采设备知识图谱时因数据较少导致的精度不足问题.
Construction of knowledge graph for fully mechanized coal mining equipment based on joint coding
Using knowledge graph technology for data management can achieve effective representation of fully mechanized coal mining equipment.The information with deep mining value can be obtained.The imbalanced data of fully mechanized coal mining equipment and the limited number of entities in certain categories of equipment affect the precision of entity recognition models.In order to solve the above problems,a knowledge graph construction method for fully mechanized coal mining equipment based on joint coding is proposed.Firstly,the fully mechanized coal mining equipment ontology model is constructed,determining the concepts and relationships.Secondly,the entity recognition model is designed.The model uses Token Embedding,Position Embedding,Sentence Embedding,and Task Embedding 4-layer Embedding structures and Transformer Encoder to encode fully mechanized coal mining equipment data,extract dependency relationships and contextual information features between words.The model introduces a Chinese character library,using the Word2vec model for encoding,extracting semantic rules between characters,and solving the problem of rare characters in fully mechanized coal mining equipment data.The model uses the GRU model to jointly encode the data of fully mechanized coal mining equipment and the character vectors encoded in the font library,and fuse vector features.The model uses the Lattice-LSTM model for character decoding to obtain entity recognition results.Finally,the model uses graph database technology to store and organize extracted knowledge in the form of graphs,completing the construction of knowledge graphs.Experimental verification is conducted on the dataset of fully mechanized coal mining equipment.The results show that the method improves the recognition accuracy of fully mechanized coal mining equipment entities by more than 1.26%compared to existing methods,which to some extent alleviates the low accuracy problem caused by insufficient data when constructing a knowledge graph of fully mechanized coal mining equipment in a small sample situation.

fully mechanized coal mining equipmentknowledge graphontology modeljoint codingentity recognition

韩一搏、董立红、叶鸥

展开 >

西安科技大学 计算机科学与技术学院,陕西 西安 710054

煤矿综采设备 知识图谱 本体模型 联合编码 实体识别

中国博士后科学基金

2020M673446

2024

工矿自动化
中煤科工集团常州研究院有限公司

工矿自动化

CSTPCD北大核心
影响因子:0.867
ISSN:1671-251X
年,卷(期):2024.50(4)
  • 22