电子设计工程2025,Vol.33Issue(2) :33-37.DOI:10.14022/j.issn1674-6236.2025.02.007

基于分类识别的工程档案数据快速检索技术

Fast retrieval technology for engineering archive data based on classification recognition

王建忠 吴昕达
电子设计工程2025,Vol.33Issue(2) :33-37.DOI:10.14022/j.issn1674-6236.2025.02.007

基于分类识别的工程档案数据快速检索技术

Fast retrieval technology for engineering archive data based on classification recognition

王建忠 1吴昕达1
扫码查看

作者信息

  • 1. 浙江宁海抽水蓄能有限公司,浙江 宁波 315621
  • 折叠

摘要

传统的档案数据库检索方法数据处理时间长、效率低,而且使用关系型数据库等传统存储方式的成本高、鲁棒性差,文中以分布式存储数据库HBase为依托,提出了一种基于分类识别的工程档案数据快速检索模型.该模型主要由数据分类识别技术和数据快速检索技术模块组成.数据分类识别技术针对传统TF-IDF算法存在单词位置信息不敏感的缺点,使用类间和类内方法进行改进,并与朴素贝叶斯网络结合来提升分类识别准确率.数据快速检索技术模块则利用CNN和LSTM进行数据特征提取,采用哈希算法生成数据哈希码,提升了检索速度.在实验测试中,改进TF-IDF算法在不同数据集中准确率、召回率和F1值指标均为最优,检索时间缩短了10%以上且鲁棒性较强.实验结果表明所提方法超越了传统手段,兼具高效性与稳定性.

Abstract

The traditional archive database retrieval method has long data processing time,low efficiency,and high cost and poor robustness when using traditional storage methods such as relational databases.Based on the distributed storage databass HBase,a fast retrieval model for engineering archive data based on classification recognition is proposed in the artide.This model mainly consists of modules for data classification and recognition technology and rapid data retrieval technology.The data classification and recognition technology addresses the shortcomings of traditional TF-IDF algorithms that are not sensitive to word position information.It uses inter class and intra class methods to improve the accuracy of classification and recognition,and combines them with Naive Bayes networks to improve the accuracy.The data fast retrieval technology module utilizes CNN and LSTM for data feature extraction,and uses a Hash algorithm to generate data Hash codes,improving retrieval speed.In experimental testing,the improved TF-IDF algorithm achieved the best accuracy,recall,and F1 values in different datasets.The retrieval time was reduced by more than 10%and the robustness was high.The experimental result indicate that the proposed method surpasses traditional methods and combines efficiency and stability.

关键词

分布式数据库/TF-IDF/深度哈希算法/朴素贝叶斯/数据检索

Key words

distributed database/TF-IDF/deep Hash algorithm/Naive Bayes/data retrieval

引用本文复制引用

出版年

2025
电子设计工程
西安三才科技实业有限公司

电子设计工程

影响因子:0.333
ISSN:1674-6236
段落导航相关论文