首页|题名相似度模型在文献数据质量控制中的应用

题名相似度模型在文献数据质量控制中的应用

扫码查看
针对特色文献资源建设面临采访预订单中元数据描述不标准、字段不齐全、输入不规范,采访渠道广泛等问题给查重工作带来的难度,本文提出了基于题名相似度的查重模型,将题名经过数据预处理后利用word2vec提取题名的特征向量,计算题名之间的余弦相似度解决文献的查重问题.实验结果表明该查重模型具有较好的效果,为图书馆馆藏特色文献资源建设提供了可行的借鉴.
Research on the Application of Title Similarity Calculation Model in Quality Control of Characteristics Literature Data
Due to the problems such as non-standard metadata description,incomplete fields,non-standard input,and extensive interview channels in the interview booking for the construction of provincial characteristics of literature resources,the interview work is difficult in checking.This paper proposes a duplicate checking model based on title similarity,use word2vec to extract the feature vector of the title after data preprocessing,calculate cosine similarity between titles,finally solve the problem of title duplication of documents.The experimental results show that the checking model has a good effect,it provides a feasible reference for the construction of characteristic literature resources in library.

special collectionmetadatatitle checkword2veccosine similarity

金光龙、张光照、张银玲、YANG Fan

展开 >

贵州财经大学图书馆,贵州贵阳 550025

Guizhou University of Finance and Economics Library 550025

特色馆藏 元素据 题名查重 word2 vec 余弦相似度

2022年度贵州财经大学校级项目

2022KYYB14

2024

长江信息通信
湖北通信服务公司

长江信息通信

影响因子:0.338
ISSN:2096-9759
年,卷(期):2024.37(2)
  • 7