首页|融合CNN和Transformer的建筑风格分类算法

融合CNN和Transformer的建筑风格分类算法

扫码查看
建筑风格的准确分类对研究建筑文化和人类历史文明具有重要意义.基于卷积神经网络(convolutional neural network,CNN)的模型由于其强大的特征提取能力,在建筑风格分类领域取得了良好的效果.但是,目前大多数的CNN模型只提取了建筑的局部特征,而基于Transformer的模型在注意力机制的作用下,可以提取建筑的全局特征.为了提高建筑风格分类的准确性,提出了一种融合CNN和Transformer的建筑风格分类方法,该网络的核心部分为CT-Block结构.该结构在通道维度上分为CNN和Transformer两个分支,特征分别通过这两个通道之后再拼接起来.该结构不仅能融合CNN提取的局部特征和Transformer提取的全局特征,而且还能减轻双分支结构带来的模型变大,参数量增多的问题.在Architectural Style Dataset和WikiChurches数据集上,该算法的准确率分别为 79.83%和 68.41%,优于建筑风格分类领域其他模型.
Architectural style classification algorithm fusing CNN and Transformer
The accurate classification of architectural style is of great significance to the study of architectural culture and human history and civilization.Models based on convolutional neural network(CNN)has achieved good performance in the field of architectural style classification due to its powerful feature extraction ability.However,most current CNN models only extract the local features of architecture buildings.With the attention mechanism,a model based on Transformer can extract the globle features of architecture buildings.In order to improve the accuracy of architectural style classification,an architectural style classification method fusing CNN and Transformer is proposed.The core of the network is CT-Block structure.In terms of channel dimension,the structure is divided into two branches,CNN and Transformer,and the features pass through the two channels respectively and then concatenate together.This structure then concatenate together.This structure can not only fuse the local features extracted by CNN and the global features extracted by Transformer,but also alleviate the problem of model size and parameter number increase caused by the two-branch structure.The experimental results of Architectural Style Dataset and WikiChurches dataset were 79.83%and 68.41%respectively,which was better than other models in the field of architectural style classification.

architectural style classificationconvolutional neural networkTransformer modelnetwork fusionattention mechanism

刘东、张荣福、秦俊祥、龚俊哲、曹志彬

展开 >

上海理工大学 光电信息与计算机工程学院,上海 200093

建筑风格分类 卷积神经网络 Transformer模型 网络融合 注意力机制

2024

光学仪器
中国仪器仪表学会 上海光学仪器研究所 中国光学学会工程光学专业委员会

光学仪器

影响因子:0.432
ISSN:1005-5630
年,卷(期):2024.46(5)