首页|大模型驱动的学术文本挖掘——推理端指令策略构建及能力评测

大模型驱动的学术文本挖掘——推理端指令策略构建及能力评测

扫码查看
大型语言模型突出的任务理解和指令遵循能力,使用户可以通过简单的指令交互完成复杂的信息处理任务.科技文献分析领域正在积极探索大模型的应用,但尚未形成对指令工程技术和模型能力边界的系统性研究.本文以学术文本挖掘任务为切入点,从上下文学习、思维链推理等角度设计推理端指令策略,构建了涵盖文本分类、信息抽取、文本推理和文本生成4个能力维度共6项任务的大模型学术文本挖掘专业能力评测框架,并选取了7个国内外主流的指令调优模型进行实验,对比了不同指令策略的适用范围和不同参数模型的专业能力.实验结果表明,少样本、思维链等复杂指令策略在分类任务上的应用效果并不显著,而在抽取、生成等难度较高的任务上表现良好.千亿级参数规模的大模型经过指令引导,能够取得与充分训练的深度学习模型相近的效果,但对于十亿级或百亿级规模大模型,推理端的指令策略存在明显上限.为了实现大模型向科技情报领域的深层嵌入,现阶段仍需在调优端对模型参数进行领域化适配.
Large Language Model-Driven Academic Text Mining:Construction and Evaluation of Inference-End Prompting Strategy
Task comprehension and instruction-following abilities of large language models enable users to complete complex information-processing tasks through simple interactive instructions.Scientific literature analysts are actively ex-ploring the application of large language models;however,a systematic study of the capability boundaries of large mod-els has not yet been conducted.Focusing on academic text mining,this study designs inference-end prompting strategies and establishes a comprehensive evaluation framework for large language model-driven academic text mining,encom-passing text classification,information extraction,text reasoning,and text generation,covering six tasks in total.Main-stream instruction-tuned models were selected for the experiments,to compare the different prompting strategies and pro-fessional capabilities of the models.The experiments indicate that complex instruction strategies,such as few-shot and chain-of-thought,are not effective in classification tasks,but perform well in more challenging tasks,such as extraction and generation,whereby trillion-parameter scale models achieve results comparable to those of fully trained deep-learn-ing models.However,for models with billions or tens of billions of parameter scales,there is a clear upper limit to infer-ence-end instruction strategies.Achieving deep integration of large language models into the field of scientific intelli-gence requires adaption of the model to the domain at the tuning end.

large language modelacademic text mininginstruction engineeringcapability evaluation

陆伟、刘寅鹏、石湘、刘家伟、程齐凯、黄永、汪磊

展开 >

武汉大学信息管理学院,武汉 430072

武汉大学信息检索与知识挖掘研究所,武汉 430072

大模型 学术文本挖掘 指令工程 能力评测

国家自然科学基金重点项目国家自然科学基金面上项目

7223400572174157

2024

情报学报
中国科学技术情报学会 中国科学技术信息研究所

情报学报

CSTPCDCSSCICHSSCD北大核心
影响因子:1.296
ISSN:1000-0135
年,卷(期):2024.43(8)