首页|SQL-Net: Semantic Query Learning for Point-Supervised Temporal Action Localization

SQL-Net: Semantic Query Learning for Point-Supervised Temporal Action Localization

扫码查看
Point-supervised Temporal Action Localization (PS-TAL) detects temporal intervals of actions in untrimmed videos with a label-efficient paradigm. However, most existing methods fail to learn action completeness without instance-level annotations, resulting in fragmentary region predictions. In fact, the semantic information of snippets is crucial for detecting complete actions, meaning that snippets with similar representations should be considered as the same action category. To address this issue, we propose a novel representation refinement framework with a semantic query mechanism to enhance the discriminability of snippet-level features. Concretely, we set a group of learnable queries, each representing a specific action category, and dynamically update them based on the video context. With the assistance of these queries, we expect to search for the optimal action sequence that agrees with their semantics. Besides, we leverage some reliable proposals as pseudo labels and design a refinement and completeness module to refine temporal boundaries further, so that the completeness of action instances is captured. Finally, we demonstrate the superiority of the proposed method over existing state-of-the-art approaches on THUMOS14 and ActivityNet13 benchmarks. Notably, thanks to completeness learning, our algorithm achieves significant improvements under more stringent evaluation metrics.

SemanticsVideosLocation awarenessReliabilityProposalsAnnotationsAccuracyTrainingLearning systemsLabeling

Yu Wang、Shengjie Zhao、Shiwei Chen

展开 >

School of Software Engineering, Tongji University, Shanghai, China|Engineering Research Center of Key Software Technologies for Smart City Perception and Planning, Ministry of Education, Shanghai, China|Key Laboratory of Embedded System and Service Computing, Ministry of Education, Shanghai, China

Department of R&D Data, Microsoft Asia-Pacific Technology Company Ltd., Shanghai, China

2025

IEEE transactions on multimedia

IEEE transactions on multimedia

ISSN:
年,卷(期):2025.27(1)
  • 80