Recognition of Basketball Tactics Based on Vision Transformer and Track Filter
The analysis of player trajectory data using machine learning to obtain offensive or defensive tactics is a crucial component of understanding basketball video content. Traditional machine learning methods require the setting of feature variables manually, significantly reducing flexibility. Therefore, the key issue is how to automatically obtain feature information that can be used for tactic recognition. To address this issue, a basketball Tactic Vision Transformer (TacViT) recognition model is proposed based on player trajectory data from the National Basketball Association (NBA) games. The proposed model adopts Vision Transformer (ViT) as the backbone network and multi-head attention modules to extract rich global trajectory feature information. Trajectory filters are also incorporated in order to not only enhance the feature interaction between the court lines and player trajectories, but also strengthen the representation of player position features in this study. The trajectory filters learn the long-term spatial correlations in the frequency domain with log-linear complexity. A self-built basketball tactic dataset (PlayersTrack) is created from the sequence data of the Sport Vision System (SportVU), which are converted into trajectory graphs in this work. The experiments on this dataset showed that the accuracy of TacViT reached 82.5%, which is a 16.7%improvement over the accuracy of the Vision Transformer S model (ViT-S) without modifications.