面向网络安全关系抽取的大语言模型数据增强方法

Data Augmentation Method via Large Language Model for Relation Extraction in Cybersecurity

扫码查看

原文链接

维普
万方数据

中文摘要：关系抽取技术可用于威胁情报挖掘与分析,为网络安全防御提供关键信息支持,但网络安全领域的关系抽取任务面临数据集匮乏的问题.近年来,大语言模型展现了优秀的文本生成能力,为数据增强任务提供了强大的技术支撑.为了弥补传统数据增强方式在准确性和多样性方面的不足,文章提出一种面向网络安全关系抽取的大语言模型数据增强方法MGDA,该方法从单词、短语、语法和语义 4 个粒度使用大语言模型增强原始数据,从而在确保准确性的同时提升多样性.实验结果表明,文章所提数据增强方法有效改善了网络安全关系抽取任务上的有效性以及生成数据的多样性.

外文摘要：Relationship extraction technology can be used for threat intelligence mining and analysis,providing crucial information support for network security defense.However,relationship extraction tasks in cybersecurity face the problem of dataset deficiency.In recent years,large language model has shown its superior text generation ability,providing powerful technical support for data augmentation tasks.In order to compensate for the shortcomings of traditional data augmentation methods in terms of accuracy and diversity,this paper proposed a data augmentation method via large language model for relation extraction in cybersecurity named MGDA.MGDA used large language model to enhance the original data from four granularities of words,phrases,grammar,and semantics in order to ensure accuracy while improving diversity.The experimental results show that the proposed data augmentation method in this paper effectively improves the effectiveness of relationship extraction tasks in cybersecurity and diversity of generated data.

外文关键词：

cyber securityrelation extractiondata augmentationlarge language model

作者：

李娇、张玉清、吴亚飚

展开 >

作者单位：

北京天融信科技有限公司,北京 100193

中国科学院大学计算机科学与技术学院,北京 101408

关键词：

网络安全关系抽取数据增强大语言模型

出版年：

2024

DOI：

10.3969/j.issn.1671-1122.2024.10.001

信息网络安全

公安部第三研究所　中国计算机学会计算机安全专业委员会

信息网络安全

CSTPCDCHSSCD北大核心

影响因子：0.814

ISSN：1671-1122

年,卷(期)：2024.24(10)