中国科学:信息科学(英文版)2024,Vol.67Issue(5) :39-51.DOI:10.1007/s11432-021-3536-5

CPT:a pre-trained unbalanced transformer for both Chinese language understanding and generation

Yunfan SHAO Zhichao GENG Yitao LIU Junqi DAI Hang YAN Fei YANG Zhe LI Hujun BAO Xipeng QIU
中国科学:信息科学(英文版)2024,Vol.67Issue(5) :39-51.DOI:10.1007/s11432-021-3536-5

CPT:a pre-trained unbalanced transformer for both Chinese language understanding and generation

Yunfan SHAO 1Zhichao GENG 1Yitao LIU 1Junqi DAI 1Hang YAN 1Fei YANG 2Zhe LI 2Hujun BAO 2Xipeng QIU1
扫码查看

作者信息

  • 1. School of Computer Science,Fudan University,Shanghai 200433,China;Shanghai Key Laboratory of Intelligent Information Processing,Fudan University,Shanghai 200433,China
  • 2. Zhejiang Lab,Hangzhou 311121,China
  • 折叠

Abstract

In this paper,we take the advantage of previous pre-trained models(PTMs)and propose a novel Chinese pre-trained unbalanced transformer(CPT).Different from previous Chinese PTMs,CPT is designed to utilize the shared knowledge between natural language understanding(NLU)and natural language generation(NLG)to boost the performance.CPT consists of three parts:a shared encoder,an understanding decoder,and a generation decoder.Two specific decoders with a shared encoder are pre-trained with masked language modeling(MLM)and denoising auto-encoding(DAE)tasks,respectively.With the partially shared architecture and multi-task pre-training,CPT can(1)learn specific knowledge of both NLU or NLG tasks with two decoders and(2)be fine-tuned flexibly that fully exploits the potential of the model.Moreover,the unbalanced transformer saves the computational and storage cost,which makes CPT competitive and greatly accelerates the inference of text generation.Experimental results on a wide range of Chinese NLU and NLG tasks show the effectiveness of CPT.

Key words

pre-trained model/transformer/language model/generation/unified model

引用本文复制引用

基金项目

National Key Research and Development Program of China(2020AAA0108702)

National Natural Science Foundation of China(62022027)

出版年

2024
中国科学:信息科学(英文版)
中国科学院

中国科学:信息科学(英文版)

CSTPCDEI
影响因子:0.715
ISSN:1674-733X
参考文献量49
段落导航相关论文