In order to solve the problem of different action sequences in video data with varying lengths and fixed input video frame sequences,which leads to the neglect of different temporal features,an action recognition method based on a multi convolutional network aggregation deep learning model is proposed.The network uses images of different sequence lengths and modalities as input sources,consisting of three branches.Multiple branches are used to capture feature information at different scales layer by layer.At the end of the network,the features are aggregated and the rec-ognition results are classified using a softmax classifier.The experimental results show that the accuracy of this model on the UCF101 dataset reaches 88.36%,which is better than the compara-tive experimental model and effectively improves the recognition accuracy,with a certain degree of competitiveness.
关键词
深度学习/动作识别/特征聚合/残差结构/序列特征
Key words
deep learning/action recognition/feature aggregation/residual structure/sequence charac-teristics