基于深度学习的场景文字识别技术研究

Research on Scene Text Recognition Technology Based on Deep Learning

陈志宇 ¹司占军 ²朱新雨¹

扫码查看

作者信息

1. 天津科技大学轻工科学与工程学院,天津 300457
2. 天津科技大学人工智能学院,天津 300457
折叠

摘要

基于深度学习的场景文字识别技术(Scene Text Recognition,STR)应用广泛但性能尚需提升.针对现有的STR技术对小目标文字识别不准确和中文、中英文混合准确率低的问题,通过改进模型增加104×104的特征尺度,用Focal Loss和GIOU Loss作为损失函数来优化目标检测框,将卷积块注意力模块(Convolutional Block Attention Module,CBAM)嵌入到卷积层中,使网络在特定位置和通道上更加关注目标,抑制其余复杂背景信息以此来提高模型的文字检测能力;分析中文的文字特征,对CRNN的特征提取网络改进优化,提高了原有模型对中文、中英文混合识别的准确性.实验结果表明,通过对文字检测与识别模型和算法的改进优化,大大提高了场景文字识别技术的准确性和鲁棒性.

Abstract

Scene Text Recognition(STR)technology based on deep learning is widely used,but its performance should be further.To address the issues of inaccurate recognition of small text and low accuracy in recognizing mixed Chinese and English text in existing STR techniques,the model was enhanced by increasing the feature scale of 104×104.Additionally,the object detection boxes were optimized using Focal Loss and GIOU Loss as loss functions.The Convolutional Block Attention Module(CBAM)was also embedded into the convolutional layers.It can help the network focus more on the target at specific locations and channels while suppressing complex background information.These improvements collectively enhanced the text detection capabilities of the model.Furthermore,the textual features of Chinese text were analyzed and the feature extraction network of CRNN was improved to enhance the recognition accuracy of mixed Chinese and English text.Experimental results showed that the accuracy and robustness of scene text recognition technology have been significantly enhanced by improving and optimizing the text detection and recognition model and algorithms.

关键词

深度学习/场景文字识别技术/图像处理/目标检测/文字识别

Key words

Deep learning/Scene Text Recognition technology/Image processing/Target detection/Text recognition

引用本文复制引用

出版年

2024

数字印刷

中国印刷科学技术研究所

数字印刷

北大核心

ISSN：2095-9540

段落导航