A Survey of Group Activity Recognition Based on Deep Learning
Different from traditional action recognition focused on single individuals,group activity recognition aims to understand the complex semantics composed of individual actions and their interactions within a scene.In recent years,the application of group activity recognition in various domains such as public safety monitoring,sports video analysis,and social role understanding has garnered significant attention from researchers.However,there is a scarcity of Chinese literature providing a comprehensive overview of the research progress in this field,and the foundational aspects for induction and analysis remain vague.This paper aims to fill this gap by offering a thorough review of the progress in group activity recognition research over the past decade,with a particular focus on developments facilitated by deep learning technologies.To begin,we establish a clear problem definition for group activity recognition,differentiating it from individual action recognition by highlighting the significance of understanding group dynamics and interactions.Following this,we outline the basic pipeline common to most group activity recognition approaches,which typically involves the detection and tracking of individuals,the extraction of features pertinent to their actions,the recognition of individual actions,and the aggregation of these actions to infer group activities.Concurrently,we discuss the challenges inherent to this research field,such as the variability in group sizes,the complexity of interactions,and the diversity of possible group activities across different contexts.Delving deeper into the core aspects of group activity recognition research,this paper then provides an in-depth analysis of two critical components:the extraction of individual action features and their association modeling.We introduce several deep learning-based methods for extracting video features that are commonly employed in the study of group activities.These methods are adept at capturing the nuances of individual actions and the contextual information necessary for understanding group dynamics.Following this,we categorize existing approaches to modeling the associations between individual actions into three distinct types:linear association,sequence association,and graph association.Each type offers a unique perspective on how individual actions interact and combine to form coherent group activities,from simple linear relationships to complex,non-linear interactions represented by graphs.Furthermore,recognizing the importance of empirical research in advancing the field,this paper provides a comprehensive list of 12 existing video datasets specifically curated for group activity research.These datasets vary in terms of the scenarios they cover,from sports and public spaces to more controlled settings,thereby offering diverse opportunities for testing and improving group activity recognition algorithms.We also conduct a comparative analysis of existing methods using the two most popular datasets,highlighting their strengths and weaknesses and providing insights into their performance.In conclusion,this paper offers a comprehensive review of the advancements in group activity recognition based on deep learning over the past decade.It covers the problem definition,research challenges,feature extraction techniques,association modeling methods,evaluation datasets,and future research directions.By consolidating and analyzing the existing knowledge,this review provides researchers with valuable insights and guidance for further exploration and development in the field of group activity recognition.
video understandingaction recognitiongroup activity recognitiondeep learningattention mechanismrecurrent neural networkgraph model