AI生成与人类作者撰写论文要素的检测研究
Research on the Detection of Elements in AI Generation and Scholar Writing Papers
熊盼 1杨浠 1郑旭飞 1吴徐龙2
作者信息
- 1. 西南大学计算机与信息科学学院/软件学院,重庆 400715
- 2. 重庆信安网络安全等级测评有限公司,重庆 401121
- 折叠
摘要
为了探究AI与人类作者在学术写作上的异同,以ChatGPT,文心一言、通义千问为研究对象,针对计算机科学领域的期刊论文,构建了 3个不同种类的中文数据集,分别是AI生成、人类撰写和二者混合的摘要、引言、结论文本.从词性标注、文本长度、高频词和高频搭配等角度出发进行对比分析,使用Self-BLEU、Perplexity、语义相似度等指标评估了该数据集的文本质量.本文发现学者通常使用更复杂的句子结构,撰写的文本展现了更高的多样性,而AI生成的文本更容易被大语言模型预测.接着,将检测任务转化为一个二元分类任务,在13种基线模型上进行了实验,进一步提出了 DeBERTa-BiORU模型,准确率达到了 91%,优于其他分类器.通过这一创新方法,可以有效防止学术不端行为,为学术期刊编辑提供检测工具,保持学术界的公信力.
Abstract
To explore the similarities and differences between AI and scholars in academic writings academic writing,three AI tools,ChatGPT,ERNIE Bot and Tongyi Qianwen were taken as research objects,and three different kinds of Chinese datasets were constructed for journal papers in the field of computer science,namely AI generation,scholars'writing,and the summa-ry,introduction,and conclusion texts of both.From the perspectives of part-of-speech tagging,text length,high-frequency words,and high-frequency collocations,a comparative analysis was conducted to evaluate the text quality of the dataset using indicators such as self-bleu,perplexity,and semantic similarity.We found that scholars typically use more complex sentence structures to write texts that exhibit higher diversity,while AI generated texts are more easily predicted by large language models.Subsequently,the detection task was transformed into a binary classification task and experiments were conducted on 13 baseline models.Furthermore,the DeBERTa-BiGRU model was proposed,and it can achieve an accuracy of 91%,which is superior to other classifiers.Through this innovative method,academic misconduct can be effectively prevented,providing de-tection tools for academic journal editors and maintaining the credibility of the academic community.
关键词
文本分类/人工智能生成内容/深度学习/文本检测Key words
text classification/artificial intelligence generated content/deep learning/text detection引用本文复制引用
出版年
2024