Speech emotion recognition with lightweight networks incorporating attention mechanisms
In the process of speech emotion recognition,to solve the problems of lack of dialect database and low accuracy of recognition model,a speech emotion database of Liaoxi dialect was established,and a speech emotion recognition model integrating attention mechanism lightweight network was proposed.The model consists of four parts:feature combination network,CBAM attention mechanism,deep convolutional network,and output layer.Three parallel convolutions with different sizes are used to extract the shallow speech features and concatenate them.The CBAM attention module is introduced to refine the input features.The fused feature input deep convolutional network extracts the deep feature of speech and outputs the multi-dimensional feature vector;The output layer classifies and recognizes speech emotion.The model was verified on IEMOCAP,Emo-DB,and Liaoxi dialect speech emotion database,and the accuracy rates were 82.5%,96.2%,and 90.8%,respectively.Experimental results show that compared with other deep learning models,the proposed model has fewer parameters and a higher recognition rate.