Research on the Application of Title Similarity Calculation Model in Quality Control of Characteristics Literature Data
Due to the problems such as non-standard metadata description,incomplete fields,non-standard input,and extensive interview channels in the interview booking for the construction of provincial characteristics of literature resources,the interview work is difficult in checking.This paper proposes a duplicate checking model based on title similarity,use word2vec to extract the feature vector of the title after data preprocessing,calculate cosine similarity between titles,finally solve the problem of title duplication of documents.The experimental results show that the checking model has a good effect,it provides a feasible reference for the construction of characteristic literature resources in library.
special collectionmetadatatitle checkword2veccosine similarity
金光龙、张光照、张银玲、YANG Fan
展开 >
贵州财经大学图书馆,贵州贵阳 550025
Guizhou University of Finance and Economics Library 550025