ACTION SPOTTING AND BOUNDARY PREDICTION FOR TEMPORAL ACTION LOCALIZATION
Temporal action localization aims to find the start and the end time(i.e.,temporal boundary)of different action instances in videos.Existing reinforcement learning methods have two drawbacks:1)these methods repeatedly search the same content of video;2)frame-level input incurs insufficient semantic information.Therefore,this paper proposes a novel temporal action localization method using action spotting and boundary prediction(ASBP).Action spotting was regarded as a reinforcement learning problem,for which training video was re-encoded into a sequence consisting of multiple video units as Environment,and the Agents containing memory modules interacted with the environment where there existed action instance removing mechanism.By this means,these agents would learn how to skip background via observing video units and identify those units involving action instance.On the other hand,the boundary prediction was treated as a regression problem,where boundary prediction network directly predicted the temporal boundary of action instance according to the video units discovered by agents.Experimental results on THUMOS-14 show that the proposed method improves the mAP@0.5 metric by 6.6%over the state-of-the-art reinforcement learning methods,which well validates its superiority.