首页|混合相似性度量的仪表询价电子表格结构识别

混合相似性度量的仪表询价电子表格结构识别

扫码查看
对仪表企业来说,快速高效地自动响应用户的询价请求,实现无人化询价,具有非常重要的意义.但不同用户提供的物料清单表无统一规范的格式,导致仪表企业只能获得半结构化的询价电子表格,无人化询价系统难以分析与理解.构建无人化询价系统的关键是准确地自动提取仪表参数,而提取参数的前提是正确理解表格结构.因此,以构建无人化询价系统为目标,研究仪表询价电子表格的结构识别,提出混合相似性度量表格结构识别方法(hybrid simi-larity metrics for table structure recognition,HSMTSR).所提方法结合 Levenshtein 距离、Dice 系数和单元格类型相似度(cell type similarity,TySim),根据单元格和行数据的相似度解析识别表格结构.同时,建立流量仪表电子表格数据集(flowmeter spreadsheet dataset,FSDS)研究分析仪表询价电子表格的结构,包括714个电子表格,8 574行数据.实际应用表明,所提方法可准确高效地自动识别多种复杂结构的仪表询价电子表格,并在多个评价指标上均取得较好效果.
Hybrid similarity metric for instrument quotation spreadsheet structure recognition
For instrumentation companies,it is of great significance to quickly and efficiently automate the response to users'request for quotation and to realize unmanned quotation.Nevertheless,there is no unified and standardized format for the bill of materials spreadsheets provided by different users,resulting in semi-structured quotation spreadsheets for instrumentation companies and creating difficulties for unmanned quotation systems to perform analysis.The key to building an unmanned quotation system is to accurately automate the extraction of meter parameters,which presupposes a proper understanding of the spreadsheet structure.Therefore,with the goal of building an unmanned quotation system,this paper studies the structure recognition of instrument quotation spreadsheets and proposes hybrid similarity metrics for table structure recognition(HSMTSR).With Levenshtein distance,Dice coefficient and cell type similarity(TySim),this approach identifies spreadsheet structures based on the similarity resolution of cell and row data.Meanwhile,flowmeter spreadsheet dataset(FSDS)is built to analyze the structure of meter quotation spreadsheet,including 714 spreadsheets with 8 574 rows of data.Practical applications show the method accurately and efficiently automates the identification of multiple complex structures of instrument quotation spreadsheets,and achieves superior results in several evaluation metrics.

spreadsheetsstructure recognitionsimilarity metricstype similarityinstrument quo-tation

徐传运、马莹丽、李刚、舒涛、李星光

展开 >

重庆理工大学两江人工智能学院,重庆 401135

重庆师范大学计算机与信息科学学院,重庆 401331

电子表格 结构识别 相似性度量 类型相似度 仪表询价

重庆市巴南区科委项目重庆市科委项目重庆市科委项目重庆市教委项目重庆理工大学研究生创新项目

2020QC413cstc2020jscxmsxmX0086cstc2019jscxzdztzx0043KJQN202001137gzlcx20222137

2024

重庆理工大学学报
重庆理工大学

重庆理工大学学报

CSTPCD北大核心
影响因子:0.567
ISSN:1674-8425
年,卷(期):2024.38(1)
  • 10