[Purpose/Significance]This paper systematically reviews and analyzes the relevant research on pre-trained language models(PLM)in information science(IS)and intelligence work,and provides reference for the integra-tion of pre-trained models and IS research.[Method/Process]Firstly,it briefly described the basic principles and devel-opment of PLM,and summarized the widely used PLM in IS research.Secondly,it analyzed the research hotspots at the macro level and summarized the related achievements in information organization,information retrivel,and information mining at the micro level.And it explored the application methods,improvement strategies,and performance of PLM in detail.Finally,it discussed the opportunities and challenges of PLM in IS in corpus,training,evaluation,and application.[Result/Conclusion]Currently,BERT and its variants are the most widely used and perform best in IS.The paradigm combining neural networks and fine-tuning is used in various scenarios,especially in domain information extraction and text classification.Its performance can be improved by continuing pre-training,external knowledge enhancement,and architecture optimization.The key issues to be considered in the future are balancing the scale and quality of the training corpus,improving the usability and security of the model,evaluating the real ability of the model with high accuracy and multi-dimensionality,and accelerating the implementation of subject knowledge mining tools.
关键词
情报学/情报工作/预训练语言模型/自然语言处理/PLM
Key words
information science/intelligence work/pre-trained language models/natural language process-ing/PLM
引用本文复制引用
基金项目
国家自然科学基金面上项目(71974094)
南京大学中央高校基本科研业务费专项(71974094)
Fundamental Research Funds for the Central Universities of Ministry of Education of China(0108/14370317)