针对商品包装文本检测任务中弯曲密集型文本导致的错检、漏检问题,提出了一种由2个子网络组成的基于链接关系预测的文本检测框架(text detection network based on relational prediction,RPTNet)。在文本组件检测网络中,下采样采用卷积神经网络和自注意力并行的双分支结构提取局部和全局特征,并加入空洞特征增强模块(DFM)减少深层特征图在降维过程中信息的丢失;上采样采用特征金字塔与多级注意力融合模块(MAFM)相结合的方式进行多级特征融合以增强文本特征间的潜在联系,通过文本检测器从上采样输出的特征图中检测文本组件;在链接关系预测网络中,采用基于图卷积网络的关系推理框架预测文本组件间的深层相似度,采用双向长短时记忆网络将文本组件聚合为文本实例。为验证RRNet的检测性能,构建了一个由商品包装图片组成的文本检测数据集(text detection dataset composed of commodity packaging,CPTD1500)。实验结果表明:RPTNet不仅在公开文本数据集CTW-1500和Total-Text上取得了优异的性能,而且在CPTD1500数据集上的召回率和F值分别达到了 85。4%和87。5%,均优于当前主流算法。
Text detection of curved and dense products based on link relationship prediction
A detection framework consisting of two sub-networks,text detection network based on relational prediction(RPTNet)is proposed to solve the problem of error detection caused by curved and dense texts in the text detec-tion task of commodity packaging images.In the text component detection network,local and global features are extracted using a parallel downsampling structure of convolutional neural network and self-attention.A dilated feature enhancement module(DFM)is added to the downsampling structure to reduce the information loss of the deep feature maps.The feature pyramid network is combined with the multi-level attention fusion module(MAFM)in upsampling structure to enhance the connections between different features and the text detector de-tects the text components from the upsampled feature maps.In the link relational prediction network,a relational reasoning framework based on graph convolutional network is used to predict the deep similarity between the text component and its neighbors,and a bi-directional long short-term memory network is used to aggregate the text components into text instances.In order to verify the detection performance of RPTNet,a text detection dataset CPTD1500 composed of commodity packaging images is constructed.The test results show that the effectiveness of the proposed RPTNet is verified by two publicly available text datasets,CTW-1500 and Total-Text.And the recall and F value of RPTNet on CPTD1500 are 85.4%and 87.5%,respectively,which are superior to current mainstream algorithms.
text detectionconvolutional neural networkself-attentionfeature fusiongraph convolutional networkbi-di-rectional long short-term memory network