基于文字局部结构相似度量的开放集文字识别方法

Open-set Text Recognition via Part-based Similarity

刘畅 ¹杨春 ¹殷绪成¹

扫码查看

作者信息

1. 北京科技大学计算机与通信工程学院北京 100083
折叠

摘要

开放集文字识别(Open-set text recognition,OSTR)是一项新任务,旨在解决开放环境下文字识别应用中的语言模型偏差及新字符识别与拒识问题.最近的OSTR方法通过将上下文信息与视觉信息分离来解决语言模型偏差问题.然而,这些方法往往忽视了字符视觉细节的重要性.考虑到上下文信息的偏差,局部细节信息在区分视觉上接近的字符时变得更加重要.本文提出一种基于自适应字符部件表示的开放集文字识别框架,构建基于文字局部结构相似度量的开放集文字识别方法,通过对不同字符部件进行显式建模来改进对局部细节特征的建模能力.与基于字根(Radical)的方法不同,所提出的框架采用数据驱动的部件设计,具有语言无关的特性和跨语言泛化识别的能力.此外,还提出一种局部性约束正则项来使模型训练更加稳定.大量的对比实验表明,本文方法在开放集、传统闭集文字识别任务上均具有良好的性能.

Abstract

Open-set text recognition(OSTR)is an emerging task that aims to address language bias and novel characters in open-world text recognition applications.Recent OSTR methods have achieved some success by de-coupling the potentially biased context information with visual information.However,they tend to overlook the in-creasing importance of visual details.Given the biases in contextual information,detailed visual information be-came much more important in differentiating visually close characters.This work proposes an adaptive part-repres-entation-based open-set text recognition framework and an open-set text recognition method via part-based simil-arly to improve the visual details modeling by explicitly modeling different character parts.Unlike radical-based methods,the proposed framework adopts a data-driven parting scheme,hence is language agnostic.A localization constraint is further proposed to address the instability caused by the parting scheme.The full framework steadily outperforms its baseline and yields reasonable performance on the close-set benchmarks.

关键词

开放集文字识别/开放集学习/泛用零样本学习/组成学习

Key words

Open-set text recognition(OSTR)/open-set learning/generalized zero shot learning/composition learn-ing

引用本文复制引用

基金项目

新一代人工智能国家科技重大专项(2020AAA0109701)

国家杰出青年科学基金(62125601)

国家自然科学基金(62076024)

出版年

2024

自动化学报

中国自动化学会中国科学院自动化研究所

自动化学报

CSTPCD北大核心

影响因子：1.762

ISSN：0254-4156

段落导航