首页|基于提示学习的小样本文献分类方法

基于提示学习的小样本文献分类方法

扫码查看
文章研究学术文献分类中的长尾现象和文献类别新增问题,提出基于提示学习的小样本文献分类方法,旨在实现低资源场景下的文献自动分类。借助大规模预训练语言模型的文本表示与生成能力,在提示学习框架下分析不同的提示模板、文献字段、文献类别表示、样本数等对低资源文献分类的影响。实验结果表明:通过合理地设计提示模板、文献类别表示、文献字段等方式,模型能高效实现低资源场景下的文献分类(50-shot的分类F1值约85%),是传统文献分类算法的重要补充;但在处理细粒度文献分类时存在分类错误问题,需要完善。
Prompt-based Few-shot Learning for Academic Document Classification
By studying the long-tail phenomenon and emerging classification problems in academic document classification,this paper proposes a few-shot document classification method based on prompt learning to achieve automatic classification in low-resource scenarios.With the capabilities of text representation and generation from large-scale pre-trained language models(PLMs),the effects of different prompt templates,document fields,classification representations,number of samples,and other factors on document classification within the prompt learning framework are analyzed.Experimental results show that,through rationally designing prompt templates,document classification representations,document fields,and others,the proposed model is able to effectively achieve document classification in low-resource scenarios with an F1 value of 85%for 50 shots,which is an important complement to traditional document classification algorithms.However,there are some limitations in fine-grained classification that need to be improved.

few-shot learningprompt learningacademic document classificationpre-trained language model

安波

展开 >

中国社会科学院民族学与人类学研究所

小样本学习 提示学习 文献分类 预训练语言模型

国家社会科学基金

22BTQ010

2024

图书馆论坛
广东省立中山图书馆

图书馆论坛

CSTPCDCSSCICHSSCD北大核心
影响因子:1.864
ISSN:1002-1167
年,卷(期):2024.44(5)
  • 28