期刊,Pattern Recognition 2022年124卷期_国家学术搜索

期刊信息/Journal information

Pattern Recognition

Pergamon

主办单位：Pergamon

国际刊号：0031-3203

Pattern Recognition/Journal Pattern RecognitionSCIAHCIISTPEI

正式出版

收录年代

Infinite-dimensional feature aggregation via a factorized bilinear model

Dai, JindouWu, YuweiGao, ZhiJia, Yunde...

10页

查看更多>>摘要：Aggregating infinite-dimensional features has demonstrated superiority compared with their finite dimensional counterparts. However, most existing methods approximate infinite-dimensional features with finite-dimensional representations, which inevitably results in approximation error and inferior performance. In this paper, we propose a non-approximate aggregation method that directly aggregates infinite-dimensional features rather than relying on approximation strategies. Specifically, since infinite dimensional features are infeasible to store, represent and compute explicitly, we introduce a factorized bilinear model to capture pairwise second-order statistics of infinite-dimensional features as a global descriptor. It enables the resulting aggregation formulation to only involve the inner product in an infinite-dimensional space. The factorized bilinear model is calculated by a Sigmoid kernel to generate informative features containing infinite order statistics. Experiments on four visual tasks including the fine-grained, indoor scene, texture, and material classification, demonstrate that our method consistently achieves the state-of-the-art performance. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

Face photo-sketch synthesis via full-scale identity supervision

Wang, NannanLi, JieHu, QinghuaGao, Xinbo...

11页

查看更多>>摘要：Face photo-sketch synthesis refers transforming a face image between photo domain and sketch domain. It plays a crucial role in law enforcement and digital entertainment. A great deal of effort s have been devoted on face photo-sketch synthesis. However, limited by the weak identity supervision, existing methods mostly yield indistinct details or great deformation, resulting in poor perceptual appearance or low recognition accuracy. In the past several years, face identification achieved great progress, which represents the face images much more precisely than before. Considering the face image translation is also a type of face image re-representation, we attempt to introduce face recognition models to improve the synthesis performance. First, we applied existing synthesis models to augment the training set. Then, we proposed a full-scale identity supervision method to reduce redundant information introduced by these pseudo samples and take the valid information to enhance the intra-class variations. The proposed framework consists of two sub-networks: cross-domain translation (CT) network and intra-domain adaptation (IA) network. The CT network translates the input image from source domain to latent image of target domain, which overcomes the great gap between two domains with less structural deformation. The IA network adapts the perceptual appearance of latent image to target image by adversarial learning. Experimental results on CUHK Face Sketch Database and CUHK Face Sketch FERET Database demonstrate the proposed method preserved best perceptual appearance and more distinct details with less deformation. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

Scene-specific crowd counting using synthetic training images

Delussu, RitaPutzu, LorenzoFumera, Giorgio

14页

查看更多>>摘要：Crowd counting is a computer vision task on which considerable progress has recently been made thanks to convolutional neural networks. However, it remains a challenging task even in scene-specific settings, in real-world application scenarios where no representative images of the target scene are available, not even unlabelled, for training or fine-tuning a crowd counting model. Inspired by previous work in other computer vision tasks, we propose a simple but effective solution for the above application scenario, which consists of automatically building a scene-specific training set of synthetic images. Our solution does not require from end-users any manual annotation effort nor the collection of representative images of the target scene. Extensive experiments on several benchmark data sets show that the proposed solution can improve the effectiveness of existing crowd counting methods. (C) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

Discriminative deep attributes for generalized zero-shot learning

Kim, HoseongLee, JewookByun, Hyeran

11页

查看更多>>摘要：We indirectly predict a class by deriving user-defined (i.e., existing) attributes (UA) from an image in generalized zero-shot learning (GZSL). High-quality attributes are essential for GZSL, but the existing UAs are sometimes not discriminative. We observe that the hidden units at each layer in a convolutional neural network (CNN) contain highly discriminative semantic information across a range of objects, parts, scenes, textures, materials, and color. The semantic information in CNN features is similar to the attributes that can distinguish each class. Motivated by this observation, we employ CNN features like novel class representative semantic data, i.e., deep attribute (DA). Precisely, we propose three objective functions (e.g., compatible, discriminative, and intra-independent) to inject the fundamental properties into the generated DA. We substantially outperform the state-of-the-art approaches on four challenging GZSL datasets, including CUB, FLO, AWA1, and SUN. Furthermore, the existing UA and our proposed DA are complementary and can be combined to enhance performance further. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

Loss function search for person re-identification

Gu, HongyangLi, JianminFu, GuangyuanYue, Min...

14页

查看更多>>摘要：In recent years, person re-identification, which learns discriminative features for the specific person retrieval problem across non-overlapping cameras, has attracted extensive attention. One of the main challenges in person re-identification with deep neural networks is the design of the loss function, which plays a vital role in improving the discrimination of the learned features. However, most existing models utilize the hand-designed loss functions, which are usually sub-optimal and time-consuming. The search spaces of the two existing AutoML-based methods are either too complicated or too simple to include various forms of loss functions. In order to solve the irrationality of the above search spaces, in this paper, we propose a method of AutoML for loss function search named LFS-ReID for person ReID in the framework of the margin-based softmax loss function. Specifically, we first analyze the margin-based softmax loss function and conclude four key properties. Then we carefully design a sampling distribution based on the non-independent truncated Gaussian distributions to sample the loss function, which conforms to the above four properties. Finally, a method based on reinforcement learning is adopted to optimize the sampling distribution dynamically. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on four commonly used datasets.

原文链接:

NSTL
Elsevier

Sparse attention block: Aggregating contextual information for object detection

Chen, ChunlinYu, JunLing, Qiang

12页

查看更多>>摘要：It is well recognized that the contextual information of surrounding objects is beneficial for object detection. Such contextual information can often be obtained from long-range dependencies. This paper proposes a sparse attention block to capture long-range dependencies in an efficient way. Unlike the conventional non-local block, which generates a dense attention map to characterize the dependency between any two positions of the input feature map, our sparse attention block samples the most representative positions for contextual information aggregation. After searching for local peaks in a heat map of the given input feature map, it adaptively selects a sparse set of positions to represent the relationship between query and key elements. With the obtained sparse positions, our sparse attention block can well model long-range dependencies, and greatly improve the object detection performance at the additional cost of < 2% GPU memory and computation of the conventional non-local block. This sparse attention block can be easily plugged into various object detection frameworks, such as Faster R-CNN, RetinaNet and Mask R-CNN. Experiments on COCO benchmark confirm that our sparse attention block can boost the detection accuracy with significant gains ranging from 1.4% to 1.9% and negligible overhead of computation and memory usage.

原文链接:

NSTL
Elsevier

Weakly-supervised semantic segmentation with superpixel guided local and global consistency

Ma, HuiminWang, XiangHu, TianyuLi, Xi...

10页

查看更多>>摘要：Weakly supervised semantic segmentation task aims to learn a segmentation model with only image level annotations. Existing methods generally refine the initial seeds to obtain pseudo labels for training a fully supervised model. In recent years, some affinity-based methods perform well in this task. However, most of these methods only focus on the localization information from class activation map, while ignoring rule-based appearance information. In this paper, we find that the superpixel guidance is helpful for mining semantic affinities between pixels because pixels belonging to the same superpixel often have the same class label. As such, we propose a Superpixel Guided Weakly Segmentation framework, which alternately learns two modules to fuse superpixel information and localization information. The semantic segmentation results are more consistent with the image's local and global consistency through our framework. Experiments show that the proposed method achieves state-of-the-art performance, with mIoU at 70.5% on the PASCAL VOC 2012 test set and mIoU at 34.4% on the MS-COCO 2014 val set. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

Semantic clustering based deduction learning for image recognition and classification

Ma, WenchiTu, XueminLuo, BoWang, Guanghui...

9页

查看更多>>摘要：The paper proposes a semantic clustering based deduction learning by mimicking the learning and thinking process of human brains. Human beings can make judgments based on experience and cognition, and as a result, no one would recognize an unknown animal as a car. Inspired by this observation, we propose to train deep learning models using the clustering prior that can guide the models to learn with the ability of semantic deducing and summarizing from classification attributes, such as a cat belonging to animals while a car pertaining to vehicles. The proposed approach realizes the high-level clustering in the semantic space, enabling the model to deduce the relations among various classes during the learning process. In addition, the paper introduces a semantic prior based random search for the opposite labels to ensure the smooth distribution of the clustering and the robustness of the classifiers. The proposed approach is supported theoretically and empirically through extensive experiments. We compare the performance across state-of-the-art classifiers on popular benchmarks, and the generalization ability is verified by adding noisy labeling to the datasets. Experimental results demonstrate the superiority of the proposed approach. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

A cascaded nested network for 3T brain MR image segmentation guided by 7T labeling *

Wei, JieWu, ZhengwangWang, LiBui, Toan Duc...

12页

查看更多>>摘要：Accurate segmentation of the brain into gray matter, white matter, and cerebrospinal fluid using mag-netic resonance (MR) imaging is critical for visualization and quantification of brain anatomy. Compared to 3T MR images, 7T MR images exhibit higher tissue contrast that is contributive to accurate tissue de-lineation for training segmentation models. In this paper, we propose a cascaded nested network (CaNes-Net) for segmentation of 3T brain MR images, trained by tissue labels delineated from the corresponding 7T images. We first train a nested network (Nes-Net) for a rough segmentation. The second Nes-Net uses tissue-specific geodesic distance maps as contextual information to refine the segmentation. This process is iterated to build CaNes-Net with a cascade of Nes-Net modules to gradually refine the seg-mentation. To alleviate the misalignment between 3T and corresponding 7T MR images, we incorporate a correlation coefficient map to allow well-aligned voxels to play a more important role in supervising the training process. We compared CaNes-Net with SPM and FSL tools, as well as four deep learning models on 18 adult subjects and the ADNI dataset. Our results indicate that CaNes-Net reduces segmentation er -rors caused by the misalignment and improves segmentation accuracy substantially over the competing methods. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

Semi-supervised node classification via adaptive graph smoothing networks

Zheng, RuigangChen, WeifuFeng, Guocan

14页

查看更多>>摘要：Inspections on current graph neural networks suggest us to reconsider the computational aspect of the final aggregation. We consider that such aggregations perform a prediction smoothing and impute their potential drawbacks to be the inter-class interference implied by the underlying graphs. We aim at weak-ening the inter-class connections so that aggregations focus more on intra-class relations and producing smooth predictions according to weakening results. We apply a metric learning module to learn new edge weights and combine entropy losses to ensure the correspondence between the predictions and the learnt distances so that the weights of inter-class edges are reduced and predictions are smoothed ac-cording to the modified graph. Experiments on four citation networks and a Wiki network show that in comparison with other state-of-the-art graph neural networks, the proposed algorithm can improve the classification accuracy. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier