面向实体搜索的大语言模型测试评估技术

扫码查看

原文链接

万方数据
维普

中文摘要：实体搜索旨在从大量文档中准确找到与用户查询相关的实体,是信息检索中一个重要任务.实体搜索任务在提升用户体验、跨领域应用、大数据分析和智能服务中发挥着关键作用.随着大语言模型(LLM)的发展,其在多个领域中展现了卓越的性能.LLM的强大语义理解和生成能力能有效提升实体搜索的准确度,但目前针对实体搜索任务的LLM效果评测尚未充分展开.因此,提出了一种面向实体搜索任务的LLM评测框架,通过构建并公开发布跨领域中文实体搜索测试集,不仅能够完善该评测体系,还能为进一步优化和应用这些模型提供有价值的参考.此体系在九个开源LLM上进行了测试,展示了这些LLM在实体搜索中的实际效果.通过对比试验,从不同角度评估并分析了 LLM的性能,为其在实体搜索领域的应用提供实证依据,并为未来的研究提供新思路.

外文标题：Evaluation for Large Language Models in Entity Search

外文摘要：Entity search,which is a critical task in information retrieval,aims to accurately identify target entities to a user query from a vast collection of documents.It plays a key role in enhancing user experience,enabling cross-domain applications,facilitating big data analysis,and supporting intelligent services.With the development of large language models(LLMs),they have demonstrated outstanding performance across various fields.The powerful capabilities of semantic understanding and generation of LLMs can significantly improve the accuracy of entity search.However,the evaluation of LLMs specifically for entity search tasks has not yet been fully explored.Therefore,an evaluation framework for LLMs tailored to entity search tasks are proposed,which can not only improve the evaluation framework but also provide valuable insights for further optimization and application of these models.By constructing and publicly releasing a cross-domain Chinese entity search test set,nine open-source LLMs are tested and their practical performance in entity search is demonstrated.Through comparative experiments,the performance of LLMs are evaluated and analyzed from multiple perspectives,providing empirical evidence for their application in the entity search domain and offering new insights for future research.

外文关键词：

Entity searchLarge language modelEvaluation

作者：

游新冬、张旭、吕学强、董志安、马登豪

展开 >

作者单位：

北京信息科技大学网络文化与数字传播北京市重点实验室,北京 100192

关键词：

实体搜索大语言模型测评方法

出版年：

2024

DOI：

10.12060/j.issn.1000-7202.2024.06.01

宇航计测技术

中国航天科技集团一院102所二院203所

宇航计测技术

CSTPCD

影响因子：0.189

ISSN：1000-7202

年,卷(期)：2024.44(6)