首页|基于改进MTSv2的场景文本检测和识别算法研究

基于改进MTSv2的场景文本检测和识别算法研究

扫码查看
在自然场景图像中,丰富的文本内容对于全面理解场景非常重要。针对自然场景文本图像存在背景复杂、文本粘连、文本多角度等问题,提出一种基于改进MTSv2的文本检测和识别算法;检测算法以MTSv2为基础网络,采用CBAM注意力机制增大特征图中的小型文本的权重,更好捕捉图像中的关键特征;融合CE-FPN结构,减轻多尺度融合产生的特征混叠问题;引入focal loss函数,减少正负样本分布不均衡对识别准确率的影响,使网络更加关注难以分类的样本,改善模型的泛化能力;通过多个文本数据集进行训练,并在ICDAR2015数据集上进行验证,改进后模型对场景文本检测和识别的准确率达到了 89。3%,召回率达到了 87。6%,F1值达到了 88。5%,相比于原模型都有一定程度的提高。
Research on Scene Text Detection and Recognition Algorithm Based on Improved MTSv2
In natural scene images,rich text content is very important for a comprehensive understanding of the scene.Aimed at the problems of complex background,sticky text,and multi-angle text in natural scene text images,a text detection and recognition algorithm based on improved MTSv2 is proposed.The detection algorithm takes MTSv2 as the base network,adopts the convolution-al block attention module(CBAM)attention mechanism to increase the weight of small text in the feature map,so as to better capture the key features in the image;the channel enhancement-feature pyramid network(CE-FPN)structure is fused to alleviate the feature aliasing problem generated by multi-scale fusion;The focal loss function is introduced to reduce the influences of the positive and neg-ative sample distribution imbalance on the recognition accuracy,making the network focused on difficult to classify the samples,and improving the generalization ability of the model.Through training on multiple text datasets and validation on the ICDAR2015 data set,the accuracy of the improved model on the scene text detection and recognition reaches 89.3%,the recall rate reaches 87.6%,and the F1 value reaches 88.5%,this model improves the above indicators to a certain extent compared with the original model.

scene texttext detectiontext recognitionCBAMCE-FPNattention mechanism

王艳媛、茅正冲、杨雨涵

展开 >

江南大学物联网工程学院,江苏无锡 214000

场景文本 文本检测 文本识别 CBAM CE-FPN 注意力机制

国家自然科学基金国家自然科学基金青年项目

619012066170185

2024

计算机测量与控制
中国计算机自动测量与控制技术协会

计算机测量与控制

CSTPCD
影响因子:0.546
ISSN:1671-4598
年,卷(期):2024.32(9)
  • 3