首页|基于语言特征集成学习的大语言模型生成文本检测

基于语言特征集成学习的大语言模型生成文本检测

扫码查看
大语言模型的快速发展为日常生活和工作提供了极大的便利,但也为个人和社会带来了挑战.因此,迫切需要能够检测大语言模型生成文本的检测器.为了兼具良好的检测性能和泛化能力,文章提出了一种基于语言特征集成学习的大语言模型生成文本检测方法EBF Detection.EBF Detection融合了微调预训练语言模型和高阶自然语言统计特征,利用判决机制,实现了大语言模型生成文本检测.实验结果显示,EBF Detection不仅在域内数据上平均的检测准确率达到了 98.72%,而且在域外数据上的平均检测准确率达到了 96.79%.
Large Language Model-Generated Text Detection Based on Linguistic Feature Ensemble Learning
The rapid development of large language model(LLM)has provided great convenience for daily life and work,but has also brought challenges for individuals and society.Therefore,there is an urgent need for detectors that can detect text generated by large language models.For good detection performance and generalization ability,this paper proposed a large language model-generated text detection method based on linguistic feature learning—EBF detection.EBF detection combined the fine-tuned pre-trained language model and higher-order natural language statistical features,and used the decision mechanism to realize the LLM-generated text detection.Experimental results show that EBF Detection not only achieves an average detection accuracy of 98.72%on in-domain data,but also achieves an average detection accuracy of 96.79%on out-of-domain data.

large language modelLLM-generated text detectionensemble learninglinguistic feature

项慧、薛鋆豪、郝玲昕

展开 >

杭州电子科技大学网络空间安全学院,杭州 310018

大语言模型 大语言模型生成文本检测 集成学习 语言特征

国家自然科学基金浙江省重点研发计划

617721622023C03198

2024

信息网络安全
公安部第三研究所 中国计算机学会计算机安全专业委员会

信息网络安全

CSTPCDCHSSCD北大核心
影响因子:0.814
ISSN:1671-1122
年,卷(期):2024.24(7)
  • 1