AI生成与人类作者撰写论文要素的检测研究

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：为了探究AI与人类作者在学术写作上的异同,以ChatGPT,文心一言、通义千问为研究对象,针对计算机科学领域的期刊论文,构建了 3个不同种类的中文数据集,分别是AI生成、人类撰写和二者混合的摘要、引言、结论文本.从词性标注、文本长度、高频词和高频搭配等角度出发进行对比分析,使用Self-BLEU、Perplexity、语义相似度等指标评估了该数据集的文本质量.本文发现学者通常使用更复杂的句子结构,撰写的文本展现了更高的多样性,而AI生成的文本更容易被大语言模型预测.接着,将检测任务转化为一个二元分类任务,在13种基线模型上进行了实验,进一步提出了 DeBERTa-BiORU模型,准确率达到了 91％,优于其他分类器.通过这一创新方法,可以有效防止学术不端行为,为学术期刊编辑提供检测工具,保持学术界的公信力.

外文标题：Research on the Detection of Elements in AI Generation and Scholar Writing Papers

外文摘要：To explore the similarities and differences between AI and scholars in academic writings academic writing,three AI tools,ChatGPT,ERNIE Bot and Tongyi Qianwen were taken as research objects,and three different kinds of Chinese datasets were constructed for journal papers in the field of computer science,namely AI generation,scholars'writing,and the summa-ry,introduction,and conclusion texts of both.From the perspectives of part-of-speech tagging,text length,high-frequency words,and high-frequency collocations,a comparative analysis was conducted to evaluate the text quality of the dataset using indicators such as self-bleu,perplexity,and semantic similarity.We found that scholars typically use more complex sentence structures to write texts that exhibit higher diversity,while AI generated texts are more easily predicted by large language models.Subsequently,the detection task was transformed into a binary classification task and experiments were conducted on 13 baseline models.Furthermore,the DeBERTa-BiGRU model was proposed,and it can achieve an accuracy of 91％,which is superior to other classifiers.Through this innovative method,academic misconduct can be effectively prevented,providing de-tection tools for academic journal editors and maintaining the credibility of the academic community.

外文关键词：

text classificationartificial intelligence generated contentdeep learningtext detection

作者：

熊盼、杨浠、郑旭飞、吴徐龙

展开 >

作者单位：

西南大学计算机与信息科学学院/软件学院,重庆 400715

重庆信安网络安全等级测评有限公司,重庆 401121

关键词：

文本分类人工智能生成内容深度学习文本检测

出版年：

2024

西南师范大学学报(自然科学版)

西南大学

西南师范大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.805

ISSN：1000-5471

年,卷(期)：2024.(4)