近年来,以ChatGPT为代表的能够适应复杂场景、并能满足人类的各种应用需求为目标的文本生成算法模型成为学术界与产业界共同关注的焦点.然而,ChatGPT等大规模语言模型(Large Language Model,LLM)高度忠实于用户意图的优势隐含了部分的事实性错误,而且也需要依靠提示内容来控制细致的生成质量和领域适应性,因此,研究以内在质量约束为核心的文本生成方法仍具有重要意义.本文在近年来关键的内容生成模型和技术对比研究的基础上,定义了基于内在质量约束的文本生成的基本形式,以及基于"信、达、雅"的6种质量特征;针对这6种质量特征,分析并总结了生成器模型的设计和相关算法;同时,围绕不同的内在质量特征总结了多种自动评价和人工评价指标与方法.最后,本文对文本内在质量约束技术的未来研究方向进行了展望.
A Survey of Text Generation and Evaluation Based on Intrinsic Quality Constraints
Recently,the outstanding text generation language models represented by ChatGPT,which can adapt to complex scenes and meet various application demands of human beings,has become the focuses of both the academic and industrial circles.However,the advantage of large language models(LLM)such as ChatGPT that are highly faithful to user intent implies some factual errors,and it is also necessary to rely on prompt content to control the detailed generation quality and domain adaptability,so it is still of great significance to study text generation with intrinsic quality constraints as the core.Based on the comparative study of key content generation models and technologies in recent years,this paper defined the basic form of text generation with intrinsic quality constraints,and six quality features based on"credibility,expressiveness and elegance".In view of these 6 quality features,we provided analysis and comparison of generator mod-el design and related algorithms.Besides,various automatic and human evaluation methods for different intrinsic quality features are summarized.Finally,this paper looks forward to the future research directions of intrinsic quality constraint technology.
natural language processinglanguage modeltext generationtext qualitytext evaluation