首页|Efficient title text detection using multi-loss

Efficient title text detection using multi-loss

扫码查看
YouTube's "Video Chapter" feature segments videos into different sections, marked by timestamps on the slider, enhancing user navigation. Given the vast volume of video data, processing these efficiently demands substantial time and computational resources. This paper addresses two key objectives: reducing the computational cost of deep model training for text detection and enhancing overall performance with minimal effort. We introduce a classroom-based multi-loss learning approach for text detection, extending its application to title detection without requiring annotations. In deep learning, loss functions play a crucial role in updating model weights. Our proposed multi-loss functions facilitate faster convergence compared to baseline methods. Additionally, we present a novel technique to handle annotation-less data by employing a text grouping method to differentiate between regular text and title text. Experimental results on the COCO-Text and Slidin' Videos AI-5G Challenge datasets demonstrate the efficacy and practicality of our approach.

Classroom learningDeep learningMulti-lossSlidin' videos AI-5G challengeTitle text spotting

Shitala Prasad、Anuj Abraham

展开 >

School of Mathematics and Computer Science, Indian Institute of Technology Goa, Farmagudi, Ponda, Goa 403401, India

Technology Innovation Institute, Masdar City, Abu Dhabi 9639, UAE

2025

International journal on document analysis and recognition: IJDAR
  • 47