Efficient title text detection using multi-loss

扫码查看

原文链接

NETL
NSTL
Springer Nature

外文摘要：YouTube's "Video Chapter" feature segments videos into different sections, marked by timestamps on the slider, enhancing user navigation. Given the vast volume of video data, processing these efficiently demands substantial time and computational resources. This paper addresses two key objectives: reducing the computational cost of deep model training for text detection and enhancing overall performance with minimal effort. We introduce a classroom-based multi-loss learning approach for text detection, extending its application to title detection without requiring annotations. In deep learning, loss functions play a crucial role in updating model weights. Our proposed multi-loss functions facilitate faster convergence compared to baseline methods. Additionally, we present a novel technique to handle annotation-less data by employing a text grouping method to differentiate between regular text and title text. Experimental results on the COCO-Text and Slidin' Videos AI-5G Challenge datasets demonstrate the efficacy and practicality of our approach.

外文关键词：

Classroom learningDeep learningMulti-lossSlidin' videos AI-5G challengeTitle text spotting

作者：

Shitala Prasad、Anuj Abraham

展开 >

作者单位：

School of Mathematics and Computer Science, Indian Institute of Technology Goa, Farmagudi, Ponda, Goa 403401, India

Technology Innovation Institute, Masdar City, Abu Dhabi 9639, UAE

出版年：

2025

DOI：

10.1007/s10032-024-00500-y

International journal on document analysis and recognition: IJDAR

ISSN：1433-2833

年,卷(期)：2025.28(2)

参考文献量47