基于深度金字塔卷积神经网络的ChatGPT生成文本检测方法

Detecting ChatGPT Generated Texts Based on Deep Pyramid Convolutional Neural Network

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：[目的]为了防止ChatGPT被滥用,本文研究了一种中文情景下的ChatGPT生成文本(AI生成文本)检测方法.[方法]采用Prompt提示的方式,构建三个不同种类的中文数据集.在这三个数据集上进行模型训练和测试,并从模型类型、文本类型和文本长度等维度,找到一种最优的AI生成文本检测方法.[结果]首先,通过多种方法对比,基于深度金字塔卷积神经网络的文本分类方法在测试集上准确率达到0.965 5,优于其他方法;其次,经过测试,DPCNN模型具备良好的跨类别能力;最后,不同的文本长度对于模型的准确率具有直接影响.[局限]以Prompt提示方式生成的中文数据集具有类别上的局限性,本文只构建了三种类别的数据集,并在此数据集上进行模型训练,然而现实中的文本类型是多样的.[结论]本文提出一种中文情景下的AI生成文本检测方法,其准确率受到文本类型和文本长度的影响.

外文摘要：[Objective]This paper develops a method detecting ChatGPT(AI)generated Chinese texts to prevent the misuse of ChatGPT.[Methods]We constructed three Chinese datasets using the prompt-based approach.We then conducted model training and testing on these three datasets and identified an optimal AI-generated text detection method based on dimensions like model type,text type,and text length.[Results]Through various comparative approaches,the text classification method based on the Deep Pyramid Convolutional Neural Network(DPCNN)achieved an accuracy of 0.9655 on the test set,outperforming other methods.Furthermore,the DPCNN model demonstrated strong cross-category capability.The length of the texts affects the model's accuracy.[Limitations]The Chinese dataset generated by the prompt-based approach has limitations in category diversity,as only three types of datasets were constructed and used for model training.[Conclusions]This paper proposes a method for detecting AI-generated text in the Chinese context,where accuracy is influenced by text type and text length.

外文关键词：

ChatGPTText RecognitionDPCNNCross-Category

作者：

范志武、姚金良

展开 >

作者单位：

杭州电子科技大学计算机学院杭州 310018

关键词：

ChatGPT 文本识别 DPCNN 跨类别

基金：

浙江省重点研发项目

项目编号：

2019C03127

出版年：

2024

DOI：

10.11925/infotech.2096-3467.2023.0609

数据分析与知识发现

中国科学院文献情报中心

数据分析与知识发现

CSTPCDCSSCICHSSCD北大核心EI

影响因子：1.452

ISSN：2096-3467

年,卷(期)：2024.8(7)