首页|Scene recognition using multiple representation network

Scene recognition using multiple representation network

扫码查看
In recent years, with the rapid development of convolutional neural networks (CNNs), a series of computer vision tasks have been solved. However, scene recognition is still a difficult and challenging problem due to the complexity of scene images. With the emergence of large-scale scene datasets, a single representation generated by a plain CNN is no longer discriminative enough to describe massive scene images. Therefore, in this paper, we propose a comprehensive representation for scene recognition, including enhanced global scene representation, local salient scene representation, and local contextual object representation. We use two pretrained CNNs to extract original feature maps to construct the multiple representations. Specifically, we adopt class activation mapping (CAM) to find salient regions and extract local scene features and employ a bidirectional long short-term module (LSTM) to encode contextual information of objects existing in a scene. In addition, the multiple representations are generated by an end-to-end trainable model, which we call MRNet (multiple representation network). Experimental results on three publicly available scene recognition datasets demonstrate that our proposed model is superior to state-of-the-art models.

Class activation mappingConvolutional neural networksEnd-to-end trainableLong short-term memoryScene recognition

Lin C.、Lee F.、Xie L.、Cai J.、Liu L.、Chen Q.、Chen H.

展开 >

Shanghai Engineering Research Center of Assistive Devices University of Shanghai for Science and Technology

School of Information Engineering Nanchang University

Major of Electrical Engineering and Electronics Graduate School of Engineering Kogakuin University

2022

Applied Soft Computing

Applied Soft Computing

EISCI
ISSN:1568-4946
年,卷(期):2022.118
  • 6
  • 63