首页|检测报告扫描件识别技术研究

检测报告扫描件识别技术研究

扫码查看
鉴于检测报告扫描件存在报告格式多样、扫描质量较差、签名手写体不规整等特点,提出了检测报告扫描件识别解决方案。首先采用深度卷积神经网络VGG16校正报告方向;其次引入图像文本生成方法Table-Master实现报告结构分析;然后引入连接文本建议模型CTPN检测报告文字位置,采用序列文字识别模型Master识别报告文本;最后提出融合了文字和位置的多特征融合分类模型MFFC对识别结果进行关键信息提取。实验结果表明,该方案各项评价指标均优于其他识别方案,能有效提取检测报告扫描件中的文字信息,实现对检测报告扫描件中结构化信息的识别和提取。
Research on scan identification technology of detection reports
Since there are many factors affecting the identification quality of detection report,such as differ-ent report formats,poor scanning quality,and irregular handwriting,a report identification solution is pro-posed.Firstly,the deep convolutional neural network VGG16 is used to correct the report direction,and then,the image text generation method Table-Master is introduced to analyze the report structure.Nextly,the natural image text detection model CTPN is introduced for report text position recognition,and the se-quence text recognition model Master is used for report text recognition.Finally,a multi-feature fusion clas-sification model MFFC,which integrates text information and location information,is used to extract key in-formation from the recognition results.The experiment results show that the evaluation indicators of this scheme are better than other recognition model schemes,which can effectively extract the text information from the report,realize the identification and extraction of the structured information in the scanned copy of the test report,and improve the digital input efficiency of the scanned copy.

identification of detection reportdeep learninginformation identificationinformation extrac-tionpaddlepaddle

洪华军、王春艳

展开 >

中国船舶科学研究中心,江苏无锡 214000

中船重工奥蓝托无锡软件技术有限公司,江苏无锡 214000

扫描件识别 深度学习 信息识别 信息提取 paddlepaddle

船舶总体性能创新研究开放基金

25422217

2024

信息技术
黑龙江省信息技术学会 中国电子信息产业发展研究院 中国信息产业部电子信息中心

信息技术

CSTPCD
影响因子:0.413
ISSN:1009-2552
年,卷(期):2024.(9)