首页|大语言模型在分类标引工作中的应用探索

大语言模型在分类标引工作中的应用探索

扫码查看
[目的/意义]文献分类标引是图书馆等信息机构基础工作之一,目前有限的人工难以类分数量庞大的文献。大语言模型以优异的自然语言理解和处理能力,被用于完成诸如文本生成、自动摘要、文本分类等相关自然语言任务,能够与文献标引全过程相结合,有助于缓解分类标引压力。[方法/过程]结合《全国报刊索引》长期工作实践,从减轻标引人员阅读压力、大语言模型直接用于分类以及和自动标引模型相结合为切口,探索如何将大语言模型引入分类标引工作环节,以提高标引效率。[结果/结论]通过一系列对比测试和分析,设计Prompt辅助主题分类模型以及ACBKSY自动标引模型。Prompt辅助主题分类模型标引人员快速了解文献重点,减少阅读压力。ACBKSY模型整体分类准确率提高了 2。16%,非拒绝准确率提高了3。77%。在此基础上优化实际标引工作流程,目前此流程已在R、F大类文献标引中投入使用,经优化后的工作流程可以提高标引效率 1。1~1。4 倍。
Exploration and Practice of Classification Indexing Combined with Large Language Models
[Purpose/Significance]Document classification is one of the fundamental tasks of information service institutions such as libraries.The limited human resources make it challenging to categorize the vast number of documents,and the current automatic indexing technologies are not yet fully integrated into the entire indexing process.Large language models(LLMs),with their excellent capabilities in natural language understanding and processing capabilities,have been utilized for various natural language processing tasks such as text generation,automatic summarization,and text classification,which can be integrated into the entire classification process.[Method/Process]Combining the long-term practical experience of the National Newspaper Index,the research on how to introduce LLMs into the classification and indexing process is conducted from three aspects:reducing the reading pressure on indexers,directly using LLMs for classification,and combining them with automatic indexing models.A prompt-assisted topic classification model is designed to leverage the LLM for intelligent analysis and extraction of document content,guiding the model to output concise information summaries.This allows indexers to quickly understand the basic situation of the research,grasp the essence of key concepts and their interrelationships,and thus quickly and accurately determine how to classify the collections.[Results/Conclusions]When the LLM cannot be directly used for text classification tasks based on the"Chinese Library Classification"(CLC),it is combined with existing automatic models to generate the ACBKSY model.The overall classification accuracy of the model has improved by 2.16%,and the non-rejection accuracy has increased by 3.77%.On this basis,the actual indexing workflow is optimized to increase the systematicity and coherence of the indexing work,ensuring that every step from document input to final classification is more efficient and accurate.This optimized workflow has been put into use in the R and F categories of the collection,and it can improve the efficiency of indexing by 1.1 to 1.4 times.However,there are still some shortcomings in this paper,such as not providing the LLM with sufficient learning to fully understand the category settings of the CLC and some simple rule divisions;the classification based on the CLC is essentially a hierarchical classification,and how to guide the LLM to gradually output classification results in the form of multiple rounds of dialogue needs further study.

automatic indexinglarge language model(LLM)ERNIE botGPT-4

姜鹏、任龑、朱蓓琳

展开 >

上海图书馆,上海 200030

分类标引 大语言模型 文心一言 GPT-4

上海图书馆"2151工程"项目

2024

农业图书情报学报
中国农业科学院农业信息研究所

农业图书情报学报

影响因子:0.48
ISSN:1002-1248
年,卷(期):2024.36(5)
  • 16