基于语义的多层式图书自动分类实证研究

高斌 ¹马菊红 ¹顾婷¹

扫码查看

作者信息

1. 江苏科技大学图书馆
折叠

摘要

为解决图书馆图书分类中出现的人工分类的一致性与分类效率问题,将多层式图书自动分类系统应用于图书分类工作,同时导入语义概念作为改进分类效果的策略,从而提高分类质量.针对分类过程中出现的数据量或文献特征量可能不足的问题,利用Word2Vec工具保留目标词与上下文之间的语义关系特征,将带有语义的词汇扩展为特征词汇,借此改善分类效果.将图书馆畅想之星中文电子书中得到的数据,使用4种分类器(朴素贝叶斯、支持向量机、决策树、K近邻算法)实际应用于多层式图书自动分类系统.在语义方面,使用Word2Vec训练语料,并建构类似索引典的同义词词典,再扩展特征词汇,最终以正确率评估分类效果.实验结果显示,多层式图书自动分类系统在图书馆分类方面具有很好的效果,其所提出的策略确实能够提升图书分类的准确度.

Abstract

To solve the problems of consistency and efficiency in manual classification,a multi-layered automatic book classification is applied to library cataloging work,and semantic concepts are introduced as a strategy to improve the classification effect,so as to improve the classification quality.In order to solve the problem of insufficient data and literature features,the proposed strategy uses Word2Vec,which can extract the deep semantic relationships between words and contexts,to expand words features for improving the classification performance.With the collection of data from Cxstar Ebook,Naive Bayes,SVM,Decision Tree C4.5,and KNN are applied to the multi-layered automatic book classification.Regarding the proposed semantic-based approach,this study uses Word2Vec as a tool for training corpus.First,a thesaurus is built by the training results,and next the word features of the data set for classification are expanded.Finally,the classification effect is evaluated based on the accuracy level.Experimental results show that the performance of the multi-layered automatic book classification outperformed the traditional automatic book classification in a library environment.The proposed strategy can indeed improve the accuracy of book classification.

关键词

分类号/多层式/图书自动分类/Word2Vec

Key words

classification number/multi-layered/automatic book classification/Word2Vec

引用本文复制引用

基金项目

2024年度江苏高校哲学社会科学研究一般项目(2024SJYB1620)

出版年

2024

图书馆学研究

吉林省图书馆

图书馆学研究

CSTPCDCSSCICHSSCD北大核心

影响因子：1.563

ISSN：1001-0424

参考文献量2

段落导航