Scene recognition using multiple representation network

扫码查看

原文链接

NSTL
Elsevier

外文摘要：In recent years, with the rapid development of convolutional neural networks (CNNs), a series of computer vision tasks have been solved. However, scene recognition is still a difficult and challenging problem due to the complexity of scene images. With the emergence of large-scale scene datasets, a single representation generated by a plain CNN is no longer discriminative enough to describe massive scene images. Therefore, in this paper, we propose a comprehensive representation for scene recognition, including enhanced global scene representation, local salient scene representation, and local contextual object representation. We use two pretrained CNNs to extract original feature maps to construct the multiple representations. Specifically, we adopt class activation mapping (CAM) to find salient regions and extract local scene features and employ a bidirectional long short-term module (LSTM) to encode contextual information of objects existing in a scene. In addition, the multiple representations are generated by an end-to-end trainable model, which we call MRNet (multiple representation network). Experimental results on three publicly available scene recognition datasets demonstrate that our proposed model is superior to state-of-the-art models.

外文关键词：

Class activation mappingConvolutional neural networksEnd-to-end trainableLong short-term memoryScene recognition

作者：

Lin C.、Lee F.、Xie L.、Cai J.、Liu L.、Chen Q.、Chen H.

展开 >

作者单位：

Shanghai Engineering Research Center of Assistive Devices University of Shanghai for Science and Technology

School of Information Engineering Nanchang University

Major of Electrical Engineering and Electronics Graduate School of Engineering Kogakuin University

出版年：

2022

DOI：

10.1016/j.asoc.2022.108530

Applied Soft Computing

EISCI

ISSN：1568-4946

年,卷(期)：2022.118

被引量6
参考文献量63