首页|基于分类识别的工程档案数据快速检索技术

基于分类识别的工程档案数据快速检索技术

扫码查看
传统的档案数据库检索方法数据处理时间长、效率低,而且使用关系型数据库等传统存储方式的成本高、鲁棒性差,文中以分布式存储数据库HBase为依托,提出了一种基于分类识别的工程档案数据快速检索模型.该模型主要由数据分类识别技术和数据快速检索技术模块组成.数据分类识别技术针对传统TF-IDF算法存在单词位置信息不敏感的缺点,使用类间和类内方法进行改进,并与朴素贝叶斯网络结合来提升分类识别准确率.数据快速检索技术模块则利用CNN和LSTM进行数据特征提取,采用哈希算法生成数据哈希码,提升了检索速度.在实验测试中,改进TF-IDF算法在不同数据集中准确率、召回率和F1值指标均为最优,检索时间缩短了10%以上且鲁棒性较强.实验结果表明所提方法超越了传统手段,兼具高效性与稳定性.
Fast retrieval technology for engineering archive data based on classification recognition
The traditional archive database retrieval method has long data processing time,low efficiency,and high cost and poor robustness when using traditional storage methods such as relational databases.Based on the distributed storage databass HBase,a fast retrieval model for engineering archive data based on classification recognition is proposed in the artide.This model mainly consists of modules for data classification and recognition technology and rapid data retrieval technology.The data classification and recognition technology addresses the shortcomings of traditional TF-IDF algorithms that are not sensitive to word position information.It uses inter class and intra class methods to improve the accuracy of classification and recognition,and combines them with Naive Bayes networks to improve the accuracy.The data fast retrieval technology module utilizes CNN and LSTM for data feature extraction,and uses a Hash algorithm to generate data Hash codes,improving retrieval speed.In experimental testing,the improved TF-IDF algorithm achieved the best accuracy,recall,and F1 values in different datasets.The retrieval time was reduced by more than 10%and the robustness was high.The experimental result indicate that the proposed method surpasses traditional methods and combines efficiency and stability.

distributed databaseTF-IDFdeep Hash algorithmNaive Bayesdata retrieval

王建忠、吴昕达

展开 >

浙江宁海抽水蓄能有限公司,浙江 宁波 315621

分布式数据库 TF-IDF 深度哈希算法 朴素贝叶斯 数据检索

2025

电子设计工程
西安三才科技实业有限公司

电子设计工程

影响因子:0.333
ISSN:1674-6236
年,卷(期):2025.33(2)