基于元学习及域自适应的生成式文本隐写分析方法

Linguistic Steganalysis Method Based on Meta-learning and Domain Adaptation

扫码查看

原文链接

维普
万方数据

中文摘要：生成式文本隐写方法能够生成流畅、自然的隐写文本,给文本隐写分析带来了巨大挑战.当源域(训练文本)与目标域(测试文本)使用相同隐写算法,且在相同语料库下训练时,基于神经网络的文本隐写分析方法具有较好的检测性能.然而在实际应用中,我们无法预测待检测文本是采用何种隐写算法及在何种语料库下训练所生成,导致大多数基于神经网络的文本隐写分析方法难以实用.为解决这个问题,引入了元学习及域自适应的基本思想来提高隐写分析模型的泛化检测能力.我们采用预训练语言表示模型RoBERTa构建了一个词重要性语义编码模块,以充分提取文本语义特征.针对所得特征,提出了一个词间关联多尺度感知模块来关注由隐写导致的存在于相邻词与非相邻词之间的词间关系变化.实验结果表明,在多跨域场景下,该方法相较于现有文本隐写分析方法Fs-Stega检测准确率平均提高了9％.

外文摘要：Generative linguistic steganography brings challenges to linguistic steganalysis while generating steganographic text with high naturalness.The neural-based methods enable effective detection when the source domain(training text)and the target domain(tes-ting text)belong to the same training corpus and the same steganography algorithm.However,we cannot predict which training corpus and steganography algorithm are used to generate the text to be detected in practical applications,thus most of these neural-based methods are difficult to be practical.To address this issue,we introduce the basic ideas of meta-learning and domain adaptation to improve the generalization of the steganalysis model.Moreover,we construct a word importance semantic encoding module with the pre-trained language representation model RoBERTa to fully extract text semantic features.Given these features,we further propose a multi-scale perception module and an attention mechanism to capture the word correlation changes that exist between adjacent words and nonadjacent words caused by steganography.The experimental results demonstrate that in multiple cross-domain scenarios,this method has shown an average improvement of 9％in detection accuracy compared to the existing text steganalysis method Fs-Stega.

外文关键词：

generative linguistic steganalysiscross-domain detectiondomain adaptationfew-shot learningmeta-learning

作者：

李松斌、杜辉、王津港

展开 >

作者单位：

中国科学院声学研究所南海研究站海口 570100

中国科学院大学北京 100049

关键词：

生成式文本隐写分析跨域检测域适应少样本学习元学习

基金：

海南省重大科技计划项目中国科学院声学研究所自主部署项目中国科学院声学研究所自主部署项目中国科学院声学研究所自主部署项目

项目编号：

ZDKJ2020010QYTS202015QYTS202115MBDX202117

出版年：

2024

DOI：

10.20064/j.cnki.2095-347X.2024.01.004

网络新媒体技术

中国科学院声学研究所

网络新媒体技术

CSTPCD

影响因子：0.208

ISSN：2095-347X

年,卷(期)：2024.13(1)

参考文献量4