Recognition models based on deep learning are mainly used to solve the problem of coal maceral recogni-tion. However, the parameters of these models are constantly stacked during the calculation process, which leads to increased computational power demand and affects the training efficiency of the model. In light of the above prob-lems, an improved Swin-Transformer model, i. e. DA-ViT, based on the dilated convolutional self-attention ( DCSA) mechanism is constructed. The DCSA mechanism enhances the local feature information of the coal mac-eral group image while retaining its two-dimensional spatial information. By multi-scale decomposition of the large-scale convolution kernel of the coal microscopic image, the relationship between the pixels in different regions of the coal microscopic image is strengthened. The number of image attention parameters is significantly reduced by 81. 18%. To strengthen the correlation of morphological features between coal maceral images, a DA-ViT recogni-tion model is proposed by combining DCSA with the improved Swin-Transformer framework. The experimental re-sults show that compared with other existing recognition models, the DA-ViT model can improve the accuracy of the prediction results while significantly reducing the computational requirements of the model. The maximum values of the pixel accuracy ( PA) and the mean intersection over union ( mIoU) are 92. 14% and 63. 18%, respectively. The minimum values of the total model parameters ( Params) and floating point operations ( FLOPs) are 4. 95 × 106 and 8. 99 × 109 , respectively.
关键词
空洞卷积/自注意力机制/煤岩显微组分组/识别模型
Key words
dilated convolution/self-attention mechanism/coal maceral group/recognition model