基于流的文本风格迁移模型

扫码查看

原文链接

万方数据

中文摘要：近年来文本风格迁移(TST)任务受到了研究人员的广泛重视。现有研究使用变分自编码器、生成对抗网络等方法，先从输入文本中提取与风格属性无关的内容表示，再通过添加约束条件或结合风格嵌入向量的方式使解码器生成目标风格文本。已有的方法在情感迁移、形式迁移等任务上都取得了良好的进展，有效地提高了非平行数据集下文本风格迁移的准确度，但仍存在迁移后文本的内容和风格之间不匹配、迁移后原核心语义难以保留等问题。本文提出了一种基于流模型的文本风格迁移方法。该方法将文本进行初步编码后，提出利用神经样条流构造一系列可逆映射。通过流的正向过程将序列从原有隐状态编码空间整体映射到潜在分布，在此分布下将序列通过仿射耦合变换修改其风格特征，再将重组序列通过流模型的逆过程重新映射回初始隐状态编码空间。最后，通过初始隐状态序列和重组隐状态序列联合训练解码器以生成目标文本。基于流模型所构建的转换函数为可逆函数，因此，在转换隐状态时不会损失原有的分布信息，从而改善了TST任务过程中文本内容保留的问题。同时，由于训练解码器的重组隐状态序列由初始隐状态序列变化而来，故降低了TST任务迁移后内容和风格的不匹配度。此外，本文还提出了新的内容保留度评价指标，同时考虑迁移准确与内容保留，综合评判模型的整体效果。在迁移任务常用数据集上的实验结果证明，本文提出的方法在保证较高风格迁移准确率的同时，在内容保留度上取得了较好的效果，在整体性能上展现了一定程度的优势。

外文标题：Flow-based Text Style Transfer Model

外文摘要：Objective In recent years,numerous style transfer methods in the text style transfer(TST)have relied on semi-supervised and unsupervised learn-ing.The central concept of these methods involves mapping the text into a latent space,enabling the separation of content and style representa-tions and facilitating style transfer.Text style transfer focuses on converting text attributes or styles while preserving the semantic content of the source sentences.Existing methods achieve significant progress in addressing the challenges associated with traditional TST tasks.However,these methods continue to face common limitations,including mismatches between the content and style of the transferred text and difficulties in maintaining the core semantics of the original text.Minimizing textual information loss during the transfer process remains critical to resolving content-style matching issues and preserving core semantics.Method This study introduces a flow-based text style transfer model.It adopts neural spline flows(NSF)as the foundational flow model for the text transfer task using the potential of flows in style transfer tasks.Flows employing affine coupling or autoregressive transformations provide ac-curate density assessment and sampling.The model comprises a conditional encoder,a series of invertible flows,a style discriminator,and two conditional decoders.The encoder(X→Z)transforms input sentences into the latent space.The forward process of the flow model is represented as f:Z→(Z),abstracts the latent space encoding to a higher dimension,separates content and style,and reconstructs them.The inverse process,denoted as f-1:(Z)→Z,returns to the latent space.The conditional decoder(Z→X)interprets the latent space's hidden state into the target sen-tence required for the TST task.Initially,non-parallel datasets of various styles are partitioned into two domains,labeled as'a'and'b'represent-ing opposing styles to illustrate the task of transferring a sentence from style'a'to style'b'.Input sentences xsrca and xsrcb,belonging to different style domains,exhibit non-parallel text content.This research derives their initial latent state space encodings,za and zb,respectively by input-ting xsrca and xsrcb into the encoder.za is the hidden state sequence obtained by xsrca through the encoder,and likewise for zb corresponding to xsrcb.Then,using the flow,za and zb are remapped to another distribution space,yielding encodings(z)a and(z)b.In the(Z)space,which is the latent space mapped by the flow,(z)a is retained to reconstruct content while separating style.This study extends neural spline flows to further process the en-coded latent variable(z).Coupling transformations divide the input(z)into two parts,and compute θ=NN(z1:n-1),and(z)i=fθi(zi).Then,set(z)*b,1:n-1=(z)a,1:n-1 and return(z)*b=[(z)a,1:n-1,(z)b,n:d].This transformation finalizes the conversion from(z)a to(z)*b.NSF proposes monotonic rational-quadratic transforms as an alternative to coupling layers or additive/affine transformations in autoregressive layers.This approach enhances flex-ibility while preserving precise reversibility.The study employs monotonic rational-quadratic splines and their inverses as building blocks to im-plement functions fi and f-1i,where a monotonically increasing rational quadratic function defines each interval.Subsequently,the reverse pro-cess through the flow is applied to restore z*b to the Z space.In addition,a pre-trained discriminator is employed in the model training process to compare reconstructed stylized z*b with its original content source za The results are utilized to optimize the decoder.Finally,the conditional de-coder pa→b(x|z)is employed to obtain the transferred text xrecb.Similarly,for the task of transferring a style'b'sentence to style'a',the recon-struction object is(z)b,resulting in z*a.Another decoder,pb→a(x|z),is utilized to decode and obtain xreca.Results and Discussion The automatic evaluation metrics results indicated that the proposed method achieves the highest scores in terms of trans-fer accuracy(ACC)and ref-BLEU compared to the baseline models in both sentiment transfer and stance transfer tasks.In the instance analysis,several example sentences are extracted from the outputs of the sentiment transfer task for further comparison.The proposed method outperforms the selected baseline model when processing text containing multiple key pieces of information.The human evaluation results demonstrated that the proposed method performs better overall in text-style transfer tasks than the baseline.In addition,each model performs better in the sentiment transfer task than in the political stance transfer task.This difference is attributed to the subtle and implicit nature of the style information in polit-ical stance texts compared to sentiment texts.Effectively extracting these features remains an important research focus for future work.This sec-tion introduces a new metric for evaluating content preservation(CP)more comprehensively during the style transfer process to intuitively ex-press the comprehensive performance of the models.The experiment modifies the criteria proposed by Krishna et al.and combines these three scores into a single sum.This combination method supports the evaluation of content preservation by considering the similarity between the refer-ence text and the original text.Hence,a more comprehensive understanding of how well the model preserves the essential content during the text style transfer process is obtained.The proposed method achieves a 3％higher comprehensive score than the suboptimal model in sentiment trans-fer and a 5％higher score in political stance transfer.These results suggest that employing neural spline flows to handle latent space sequences not only improves content preservation but also balances transfer accuracy on this basis.Conclusion This study proposes a flow-based text style transfer model.It constructs a transformation function based on neural spline flows to ad-dress the issue of preserving text content during the TST task.It aims to reduce content and style mismatches after the TST task by jointly train-ing the decoder with recomposed hidden state sequences derived from variations in the initial hidden state sequences.The neural spline flow-based model manipulates latent text sequences with high preservation by adjusting the number of stacked flows to achieve disentanglement ef-fects on different text styles and contents.The flexibility and analytic reversibility of autoregressive transformations in neural spline flows reduce the loss of original content and semantic damage when encoding latent space hidden states.

外文关键词：

text generationtext style transferneural networkneural spline flows

作者：

张子涵、代金鞘、杨频

展开 >

作者单位：

四川大学网络空间安全学院,四川成都 610065

关键词：

文本生成文本风格迁移神经网络神经样条流

出版年：

2025

DOI：

10.15961/j.jsuese.202200639

工程科学与技术

四川大学

工程科学与技术

北大核心

影响因子：0.913

ISSN：2096-3246

年,卷(期)：2025.57(1)