融合CNN和Transformer的建筑风格分类算法
Architectural style classification algorithm fusing CNN and Transformer
刘东 1张荣福 1秦俊祥 1龚俊哲 1曹志彬1
作者信息
- 1. 上海理工大学 光电信息与计算机工程学院,上海 200093
- 折叠
摘要
建筑风格的准确分类对研究建筑文化和人类历史文明具有重要意义.基于卷积神经网络(convolutional neural network,CNN)的模型由于其强大的特征提取能力,在建筑风格分类领域取得了良好的效果.但是,目前大多数的CNN模型只提取了建筑的局部特征,而基于Transformer的模型在注意力机制的作用下,可以提取建筑的全局特征.为了提高建筑风格分类的准确性,提出了一种融合CNN和Transformer的建筑风格分类方法,该网络的核心部分为CT-Block结构.该结构在通道维度上分为CNN和Transformer两个分支,特征分别通过这两个通道之后再拼接起来.该结构不仅能融合CNN提取的局部特征和Transformer提取的全局特征,而且还能减轻双分支结构带来的模型变大,参数量增多的问题.在Architectural Style Dataset和WikiChurches数据集上,该算法的准确率分别为 79.83%和 68.41%,优于建筑风格分类领域其他模型.
Abstract
The accurate classification of architectural style is of great significance to the study of architectural culture and human history and civilization.Models based on convolutional neural network(CNN)has achieved good performance in the field of architectural style classification due to its powerful feature extraction ability.However,most current CNN models only extract the local features of architecture buildings.With the attention mechanism,a model based on Transformer can extract the globle features of architecture buildings.In order to improve the accuracy of architectural style classification,an architectural style classification method fusing CNN and Transformer is proposed.The core of the network is CT-Block structure.In terms of channel dimension,the structure is divided into two branches,CNN and Transformer,and the features pass through the two channels respectively and then concatenate together.This structure then concatenate together.This structure can not only fuse the local features extracted by CNN and the global features extracted by Transformer,but also alleviate the problem of model size and parameter number increase caused by the two-branch structure.The experimental results of Architectural Style Dataset and WikiChurches dataset were 79.83%and 68.41%respectively,which was better than other models in the field of architectural style classification.
关键词
建筑风格分类/卷积神经网络/Transformer模型/网络融合/注意力机制Key words
architectural style classification/convolutional neural network/Transformer model/network fusion/attention mechanism引用本文复制引用
出版年
2024