重庆理工大学学报2024,Vol.38Issue(1) :150-159.DOI:10.3969/j.issn.1674-8425(z).2024.01.017

混合相似性度量的仪表询价电子表格结构识别

Hybrid similarity metric for instrument quotation spreadsheet structure recognition

徐传运 马莹丽 李刚 舒涛 李星光
重庆理工大学学报2024,Vol.38Issue(1) :150-159.DOI:10.3969/j.issn.1674-8425(z).2024.01.017

混合相似性度量的仪表询价电子表格结构识别

Hybrid similarity metric for instrument quotation spreadsheet structure recognition

徐传运 1马莹丽 2李刚 2舒涛 2李星光2
扫码查看

作者信息

  • 1. 重庆理工大学两江人工智能学院,重庆 401135;重庆师范大学计算机与信息科学学院,重庆 401331
  • 2. 重庆理工大学两江人工智能学院,重庆 401135
  • 折叠

摘要

对仪表企业来说,快速高效地自动响应用户的询价请求,实现无人化询价,具有非常重要的意义.但不同用户提供的物料清单表无统一规范的格式,导致仪表企业只能获得半结构化的询价电子表格,无人化询价系统难以分析与理解.构建无人化询价系统的关键是准确地自动提取仪表参数,而提取参数的前提是正确理解表格结构.因此,以构建无人化询价系统为目标,研究仪表询价电子表格的结构识别,提出混合相似性度量表格结构识别方法(hybrid simi-larity metrics for table structure recognition,HSMTSR).所提方法结合 Levenshtein 距离、Dice 系数和单元格类型相似度(cell type similarity,TySim),根据单元格和行数据的相似度解析识别表格结构.同时,建立流量仪表电子表格数据集(flowmeter spreadsheet dataset,FSDS)研究分析仪表询价电子表格的结构,包括714个电子表格,8 574行数据.实际应用表明,所提方法可准确高效地自动识别多种复杂结构的仪表询价电子表格,并在多个评价指标上均取得较好效果.

Abstract

For instrumentation companies,it is of great significance to quickly and efficiently automate the response to users'request for quotation and to realize unmanned quotation.Nevertheless,there is no unified and standardized format for the bill of materials spreadsheets provided by different users,resulting in semi-structured quotation spreadsheets for instrumentation companies and creating difficulties for unmanned quotation systems to perform analysis.The key to building an unmanned quotation system is to accurately automate the extraction of meter parameters,which presupposes a proper understanding of the spreadsheet structure.Therefore,with the goal of building an unmanned quotation system,this paper studies the structure recognition of instrument quotation spreadsheets and proposes hybrid similarity metrics for table structure recognition(HSMTSR).With Levenshtein distance,Dice coefficient and cell type similarity(TySim),this approach identifies spreadsheet structures based on the similarity resolution of cell and row data.Meanwhile,flowmeter spreadsheet dataset(FSDS)is built to analyze the structure of meter quotation spreadsheet,including 714 spreadsheets with 8 574 rows of data.Practical applications show the method accurately and efficiently automates the identification of multiple complex structures of instrument quotation spreadsheets,and achieves superior results in several evaluation metrics.

关键词

电子表格/结构识别/相似性度量/类型相似度/仪表询价

Key words

spreadsheets/structure recognition/similarity metrics/type similarity/instrument quo-tation

引用本文复制引用

基金项目

重庆市巴南区科委项目(2020QC413)

重庆市科委项目(cstc2020jscxmsxmX0086)

重庆市科委项目(cstc2019jscxzdztzx0043)

重庆市教委项目(KJQN202001137)

重庆理工大学研究生创新项目(gzlcx20222137)

出版年

2024
重庆理工大学学报
重庆理工大学

重庆理工大学学报

CSTPCD北大核心
影响因子:0.567
ISSN:1674-8425
参考文献量10
段落导航相关论文