北京化工大学学报(自然科学版)2024,Vol.51Issue(2) :120-129.DOI:10.13543/j.bhxbzr.2024.02.013

基于空洞卷积自注意力机制的煤岩显微组分组识别模型

A coal maceral group recognition model based on a dilated convolutional self-attention mechanism

吴明阳 奚峥皓 陈军然 徐国忠
北京化工大学学报(自然科学版)2024,Vol.51Issue(2) :120-129.DOI:10.13543/j.bhxbzr.2024.02.013

基于空洞卷积自注意力机制的煤岩显微组分组识别模型

A coal maceral group recognition model based on a dilated convolutional self-attention mechanism

吴明阳 1奚峥皓 1陈军然 1徐国忠2
扫码查看

作者信息

  • 1. 上海工程技术大学 电子电气工程学院, 上海 201620
  • 2. 辽宁科技大学 化学工程学院,鞍山 114051
  • 折叠

摘要

基于深度学习的识别模型是目前解决煤岩显微组分组识别问题的主要手段,但这些模型在计算过程中参数不断堆叠,导致模型的算力需求增加,影响模型的训练效率.针对上述问题,构建了一种基于空洞卷积自注意力(DCSA)机制的改进Swin-Transformer模型——DA-ViT.首先,为了在加强煤岩显微组分组图像的局部特征信息的同时保留其二维空间信息,提出了DCSA机制,通过对煤岩显微图像的大尺寸卷积核进行多尺度分解,加强了煤岩显微图像不同区域像素之间的联系,显著降低了图像注意力的参数量,降低率为81.18%.然后,为了加强煤岩显微组分组图像间的形态特征关联性,将DCSA和改进的Swin-Transformer框架相结合,提出了DA-ViT识别模型.实验验证结果表明,与现有的其他识别模型相比,DA-ViT模型在提高预测结果准确率的同时,可显著降低模型的算力需求,其像素准确率(PA)和平均交并比(mIoU)的最大值分别为92.14%和63.18%,模型参数总量(Params)和浮点运算次数(FLOPs)的最小值分别为4.95×106和8.99×109.

Abstract

Recognition models based on deep learning are mainly used to solve the problem of coal maceral recogni-tion. However, the parameters of these models are constantly stacked during the calculation process, which leads to increased computational power demand and affects the training efficiency of the model. In light of the above prob-lems, an improved Swin-Transformer model, i. e. DA-ViT, based on the dilated convolutional self-attention ( DCSA) mechanism is constructed. The DCSA mechanism enhances the local feature information of the coal mac-eral group image while retaining its two-dimensional spatial information. By multi-scale decomposition of the large-scale convolution kernel of the coal microscopic image, the relationship between the pixels in different regions of the coal microscopic image is strengthened. The number of image attention parameters is significantly reduced by 81. 18%. To strengthen the correlation of morphological features between coal maceral images, a DA-ViT recogni-tion model is proposed by combining DCSA with the improved Swin-Transformer framework. The experimental re-sults show that compared with other existing recognition models, the DA-ViT model can improve the accuracy of the prediction results while significantly reducing the computational requirements of the model. The maximum values of the pixel accuracy ( PA) and the mean intersection over union ( mIoU) are 92. 14% and 63. 18%, respectively. The minimum values of the total model parameters ( Params) and floating point operations ( FLOPs) are 4. 95 × 106 and 8. 99 × 109 , respectively.

关键词

空洞卷积/自注意力机制/煤岩显微组分组/识别模型

Key words

dilated convolution/self-attention mechanism/coal maceral group/recognition model

引用本文复制引用

基金项目

国家自然科学基金(12104289)

出版年

2024
北京化工大学学报(自然科学版)
北京化工大学

北京化工大学学报(自然科学版)

CSTPCDCSCD北大核心
影响因子:0.399
ISSN:1671-4628
参考文献量35
段落导航相关论文