多尺度特征融合的版面分析方法

Layout Analysis Method of Multi-scale Feature Fusion

乔佳 ¹徐琨 ¹胡佩蓉¹

扫码查看

作者信息

1. 长安大学信息工程学院,陕西西安 710018
折叠

摘要

针对当前文档版面元素分析中存在的列表和文本错分,表格内小尺度文本难以识别以及空间特征保留效果差等问题,本文基于自底向上的思想,提出一种基于SegNet网络的多特征融合版面分析方法.本文方法在SegNet中引入MSCAN-SE模块,针对表格中的小尺度元素识别率低的问题,利用注意力机制MSCAN-SE中的条状特征来提升模型多尺度特征的提取能力,使得网络能够保留更多尺度的特征信息;针对列表元素和文本元素特征过于相似的问题,通过注意力机制MSCAN-SE中的空洞卷积以及通道注意力分支来扩大网络在特征提取过程的感受野.本文方法与经典的语义分割网络通过实验进行性能比较,结果表明:本文方法在版面分析的测试集上的像素准确率为97.9%,平均交并比为91.7%,平均交并比较U-Net语义分割模型、FCN语义分割模型、DeepLabV3+语义分割模型和SegNet语义分割模型分别提高了7.6%、2.4%、2.6%和1.5%.

Abstract

Aiming at the problems of list and text misclassification,the difficulty of recognizing small-scale text in tables,and the poor preservation of spatial features in the current document layout element analysis,according to bottom-up thinking,the paper proposes a multi-feature fusion layout analysis method based on SegNet network.In this paper,the MSCAN-SE module is introduced into SegNet to solve the problem of low recognition rate of small-scale elements in tables.The strip features in the at-tention mechanism MSCAN-SE are used to improve the extraction ability of multi-scale features of the model,so that the net-work can retain feature information of more scales.Aiming at the problem that the features of list elements and text elements are too similar,the receptive field of the network in the feature extraction process is expanded through the dilated convolution and channel attention branch in the attention mechanism MSCAN-SE.The performance of the proposed method is compared with the classical semantic segmentation network through experiments.The results show that the pixel accuracy of the proposed method on the test set of layout analysis is 97.9%,and the mean intersection over union ratio is 91.7%.Compared with U-Net semantic seg-mentation model,FCN semantic segmentation model,DeepLabV3+semantic segmentation model,and SegNet semantic segmen-tation model,the mean intersection and union ratio is increased by 7.6%,2.4%,2.6%and 1.5%respectively.

关键词

版面分析/多尺度注意力/语义分割/通道注意力

Key words

document layout analysis/multi-scale attention/semantic segmentation/channel attention

引用本文复制引用

基金项目

国家自然科学基金(52172302)

国家重点研发计划(2019YFB1600103)

陕西省重点研发计划(2018ZDXM-GY-044)

出版年

2024

计算机与现代化

江西省计算机学会江西省计算技术研究所

计算机与现代化

CSTPCD

影响因子：0.472

ISSN：1006-2475

段落导航