首页|LGDNet:结合局部和全局特征的表格检测网络

LGDNet:结合局部和全局特征的表格检测网络

扫码查看
在大数据时代,表格广泛存在于各类文档图像中,进行表格检测对于表格信息再利用具有重要意义.针对现有的基于卷积神经网络的表格检测算法存在感受野受限、依赖于预设的候选区域以及表格边界定位不准确等问题,该文提出一种基于DINO模型的表格检测网络.首先,设计一种图像预处理方法,旨在增强表格的角点和线特征,以更好地区分表格与文本等其他文档元素.其次,设计一种主干网络SwTNet-50,通过在ResNet中引入Swin Transformer Blocks(STB),有效地进行局部-全局特征信息的提取,提高模型的特征提取能力以及对表格边界的检测准确性.最后,为了弥补DINO模型在1对1匹配中编码器特征学习不足问题,采用协同混合匹配训练策略,提高编码器的特征学习能力,提升模型检测精度.与多种基于深度学习的表格检测方法进行对比,该文模型在表格检测数据集TNCR上优于对比算法,在IoU阈值为0.5,0.75和0.9时F1-Score分别达到98.2%,97.4%和93.3%.在ⅢT-AR-13K数据集上,IoU阈值为0.5时F1-Score为98.6%.
LGDNet:Table Detection Network Combining Local and Global Features
In the era of big data,table widely exists in various document images,and table detection is of great significance for the reuse of table information.In response to issues such as limited receptive field,reliance on predefined proposals,and inaccurate table boundary localization in existing table detection algorithms based on convolutional neural network,a table detection network based on DINO model is proposed in this paper.Firstly,an image preprocessing method is designed to enhance the corner and line features of table,enabling more precise table boundary localization and effective differentiation between table and other document elements like text.Secondly,a backbone network SwTNet-50 is designed,and Swin Transformer Blocks(STB)are introduced into ResNet to effectively combine local and global features,and the feature extraction ability of the model and the detection accuracy of table boundary are improved.Finally,to address the inadequacies in encoder feature learning in one-to-one matching and insufficient positive sample training in the DINO model,a collaborative hybrid assignments training strategy is adopted to improve the feature learning ability of the encoder and detection precision.Compared with various table detection methods based on deep learning,our model is better than other algorithms on the TNCR table detection dataset,with F1-Scores of 98.2%,97.4%,and 93.3%for IoU thresholds of 0.5,0.75,and 0.9,respectively.On the ⅢT-AR-13K dataset,the F1-Score is 98.6%when the IoU threshold is 0.5.

Table detectionConvolutional Neural Network(CNN)TransformerFeature extraction

卢迪、袁璇

展开 >

哈尔滨理工大学测控技术与通信工程学院 哈尔滨 150080

哈尔滨理工大学模式识别与信息感知黑龙江省重点实验室 哈尔滨 150080

表格检测 卷积神经网络 Transformer 特征提取

2024

电子与信息学报
中国科学院电子学研究所 国家自然科学基金委员会信息科学部

电子与信息学报

CSTPCD北大核心
影响因子:1.302
ISSN:1009-5896
年,卷(期):2024.46(12)