首页|RTTVTS:实时端到端视频文本追踪

RTTVTS:实时端到端视频文本追踪

扫码查看
视频文本追踪任务主要分为检测和追踪,目前相关模型未能充分捕捉连续视频帧间的语义联系,同时忽视了视频文本追踪的实时性需求.针对上述问题,本文提出一种实时端到端视频文本追踪模型(RTTVTS),该模型通过跨越连续多帧的预测学习来实现端到端的视频文本追踪,以解决视频文本信息中动态检测和持续跟踪的问题.首先,由堆叠特征金字塔增强模块组成的计算高效的特征增强网络.其次,利用与像素聚合协作的轻量级检测头捕捉并学习连续视频帧之间的检测信息.最后,在推理阶段结合卡尔曼滤波,将每个检测框进行关联.实验结果表明,本文提出的RTTVTS模型提升了视频文本追踪的有效性和实时性能.
RTTVTS:a real-time end-to-end video text tracking
The task of video text tracking mainly involves detection and tracking.However,the related models fail to fully-capture the semantic connections between continuous video frames,and neglect the real-time requirements of video text track-ing.To address these issues,this paper presents a real-time end-to-end video text tracking model(RTTVTS),which achieves end-to-end video text tracking by predicting across multiple continuous frames,addressing the challenges of dynamic detection and ongoing tracking in video text information.Firstly,a computationally efficient feature enhancement network composed of stacked feature pyramid enhancement modules is employed.Secondly,a lightweight detection head,working in conjunction with Pixel Aggregation,is used to capture and learn detection information between continuous video frames.Lastly,during the infer-ence phase,Kalman filtering is employed to associate each detection box.Experimental results show that the proposed RTTVTS model improves the effectiveness and real-time performance of video text tracking.

video textdetectiontext trackingend-to-end

彭亮、方思南、郑鉨彬

展开 >

长江大学 地球物理与石油工程学院,湖北 武汉 430000

视频文本 检测 文本追踪 端到端

国家自然科学基金项目湖北省教育厅科学技术研究计划中青年人才项目

42204127Q20221304

2024

阜阳师范大学学报(自然科学版)
阜阳师范学院

阜阳师范大学学报(自然科学版)

影响因子:0.263
ISSN:1004-4329
年,卷(期):2024.41(3)