首页|基于深度学习的群体动作识别综述

基于深度学习的群体动作识别综述

扫码查看
不同于传统的简单动作识别,群体动作识别需要理解场景中由若干人物的单人动作和他们之间的交互动作构成的复杂语义.近年来,群体动作识别在公共安全监控、体育视频分析和社会角色理解等领域的研究与应用引起了学者们的广泛关注.但是现有能够帮助学者们快速了解研究概况的中文文献很少且用于归纳和分析的依据较为笼统.为此,本文旨在综述近十年来基于深度学习的群体动作识别的研究进展.首先,本文介绍了群体动作识别的问题与定义,总结了现有解决方案的核心流程和该研究的关键挑战.然后,本文针对现有研究中的两个核心内容,即个体动作特征的提取及其关联建模,对现有文献作出了归纳与分析.具体而言,本文介绍并总结了群体动作研究中常用的人体行为特征,并将现有关联建模类型归纳为三类,即线性关联、序列关联和图关联.此外,本文还列举了现有的十二种可用于群体动作研究的视频数据集,并在三个常用数据集上对目前流行的方法进行了对比与分析.最后,本文研判了几个更具挑战的未来研究趋势.综上,本文剖析了群体动作识别的核心研究思路及未来研究趋势,有助于相关研究人员快速了解群体动作识别的研究概况.
A Survey of Group Activity Recognition Based on Deep Learning
Different from traditional action recognition focused on single individuals,group activity recognition aims to understand the complex semantics composed of individual actions and their interactions within a scene.In recent years,the application of group activity recognition in various domains such as public safety monitoring,sports video analysis,and social role understanding has garnered significant attention from researchers.However,there is a scarcity of Chinese literature providing a comprehensive overview of the research progress in this field,and the foundational aspects for induction and analysis remain vague.This paper aims to fill this gap by offering a thorough review of the progress in group activity recognition research over the past decade,with a particular focus on developments facilitated by deep learning technologies.To begin,we establish a clear problem definition for group activity recognition,differentiating it from individual action recognition by highlighting the significance of understanding group dynamics and interactions.Following this,we outline the basic pipeline common to most group activity recognition approaches,which typically involves the detection and tracking of individuals,the extraction of features pertinent to their actions,the recognition of individual actions,and the aggregation of these actions to infer group activities.Concurrently,we discuss the challenges inherent to this research field,such as the variability in group sizes,the complexity of interactions,and the diversity of possible group activities across different contexts.Delving deeper into the core aspects of group activity recognition research,this paper then provides an in-depth analysis of two critical components:the extraction of individual action features and their association modeling.We introduce several deep learning-based methods for extracting video features that are commonly employed in the study of group activities.These methods are adept at capturing the nuances of individual actions and the contextual information necessary for understanding group dynamics.Following this,we categorize existing approaches to modeling the associations between individual actions into three distinct types:linear association,sequence association,and graph association.Each type offers a unique perspective on how individual actions interact and combine to form coherent group activities,from simple linear relationships to complex,non-linear interactions represented by graphs.Furthermore,recognizing the importance of empirical research in advancing the field,this paper provides a comprehensive list of 12 existing video datasets specifically curated for group activity research.These datasets vary in terms of the scenarios they cover,from sports and public spaces to more controlled settings,thereby offering diverse opportunities for testing and improving group activity recognition algorithms.We also conduct a comparative analysis of existing methods using the two most popular datasets,highlighting their strengths and weaknesses and providing insights into their performance.In conclusion,this paper offers a comprehensive review of the advancements in group activity recognition based on deep learning over the past decade.It covers the problem definition,research challenges,feature extraction techniques,association modeling methods,evaluation datasets,and future research directions.By consolidating and analyzing the existing knowledge,this review provides researchers with valuable insights and guidance for further exploration and development in the field of group activity recognition.

video understandingaction recognitiongroup activity recognitiondeep learningattention mechanismrecurrent neural networkgraph model

严锐、葛晓静、黄捧、舒祥波、唐金辉

展开 >

南京大学计算机软件新技术国家重点实验室 南京 210023

南京理工大学计算机科学与工程学院 南京 210094

视频理解 动作识别 群体动作识别 深度学习 注意力机制 递归神经网络 图模型

国家资助博士后研究人员计划江苏省卓越博士后计划国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金江苏省自然科学基金

GZB202303022023ZB25662302208619252046222220762072245BK20211520

2024

计算机学报
中国计算机学会 中国科学院计算技术研究所

计算机学报

CSTPCD北大核心
影响因子:3.18
ISSN:0254-4164
年,卷(期):2024.47(11)