首页|TextFormer:A Query-based End-to-end Text Spotter with Mixed Supervision

TextFormer:A Query-based End-to-end Text Spotter with Mixed Supervision

扫码查看
End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified framework.Typical methods heavily rely on region-of-interest(RoI)operations to extract local features and complex post-pro-cessing steps to produce final predictions.To address these limitations,we propose TextFormer,a query-based end-to-end text spotter with a transformer architecture.Specifically,using query embedding per text instance,TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multitask modeling.It allows for mutual training and optimization of classifica-tion,segmentation and recognition branches,resulting in deeper feature sharing without sacrificing flexibility or simplicity.Addition-ally,we design an adaptive global aggregation(AGG)module to transfer global features into sequential features for reading arbitrarily-shaped texts,which overcomes the suboptimization problem of RoI operations.Furthermore,potential corpus information is utilized from weak annotations to full labels through mixed supervision,further improving text detection and end-to-end text spotting results.Extensive experiments on various bilingual(i.e.,English and Chinese)benchmarks demonstrate the superiority of our method.Espe-cially on the TDA-ReCTS dataset,TextFormer surpasses the state-of-the-art method in terms of 1-NED by 13.2%.

End-to-end text spottingarbitrarily-shaped textstransformermixed supervisionmultitask modeling

Yukun Zhai、Xiaoqiang Zhang、Xiameng Qin、Sanyuan Zhao、Xingping Dong、Jianbing Shen

展开 >

School of Computer Science,Beijing Institute of Technology,Beijing 100081,China

Department of Computer Vision Technology,Baidu Inc.,Beijing 100193,China

National Natural science Foundation of China

61902027

2024

机器智能研究(英文)
中国科学院自动化所

机器智能研究(英文)

CSTPCDEI
影响因子:0.49
ISSN:2731-538X
年,卷(期):2024.21(4)