首页|基于大语言模型的论文检索与分析方法研究

基于大语言模型的论文检索与分析方法研究

扫码查看
在现代学术研究中,高效准确地检索相关学术论文是至关重要的一环。传统的检索方法通常依赖于精确的关键词输入,要求用户具备一定程度的专业知识以选择和使用恰当的术语。针对这一问题,探索一种利用大语言模型(Large Language Models,LLMs)基于内容对论文进行检索与分析的方法,旨在降低检索词专业性带来的论文检索门槛,同时可以对论文内容进行一定的分析。首先,提出了基于内容的论文检索与分析设计框架,以论文解析和向量数据库为基础分别针对单篇论文、多篇论文以及较模糊的通俗描述进行检索与分析;其次,设计了论文解析方法,以及用于提取论文主要内容的大语言模型提示词,引导大语言模型更关注论文具有代表性的关键信息,从而提高检索性能,并通过对比分析获得了更有效提取信息的提示词;最后,通过对比实验证明了该方法的可行性与有效性,根据论文全文以及较模糊的通俗描述进行检索,mAP分别达98。47%和99。51%。
Research on Retrieval and Analysis Methods of Papers Based on Large Language Models
Efficient and accurate retrieval of relevant academic papers is crucial in modem academic research.Traditional retrieval methods often rely on precise keyword input,requiring users to have a certain level of professional knowledge to choose and use appropriate terminology.To address this issue,we explore a method of using Large Language Models for content-based retrieval and analysis of papers,aiming to reduce the retrieval threshold caused by the professionalism of search terms,while also allowing for certain a-nalysis of paper content.Firstly,a content based paper retrieval and analysis design framework was proposed,which is based on paper parsing and vector databases for searching and analyzing single papers,multiple papers,and vague popular descriptions.Secondly,a paper parsing method was designed,as well as a large language model prompt word for extracting the main content of the paper,guiding the large language model to pay more attention to the representative key information of the paper,thereby improving retrieval performance.And through comparative analysis,more effective prompt words for extracting information were obtained.Finally,the feasibility and ef-fectiveness of the proposed method were demonstrated through experiments.Based on the full text of the paper and vague popular de-scriptions,the mAP for retrieval reached 98.47%and 99.51%,respectively.

document retrievaldocument analysislarge language modelsprompt engineeracademic papers

解勉、陈刚、余晓晗

展开 >

中国人民解放军陆军工程大学指挥控制工程学院,江苏 南京 210007

文档检索 文档分析 大语言模型 提示词工程 学术论文

2024

计算机技术与发展
陕西省计算机学会

计算机技术与发展

CSTPCD
影响因子:0.621
ISSN:1673-629X
年,卷(期):2024.34(12)