首页期刊导航|Pattern Recognition
期刊信息/Journal information
Pattern Recognition
Pergamon
Pattern Recognition

Pergamon

0031-3203

Pattern Recognition/Journal Pattern RecognitionSCIAHCIISTPEI
正式出版
收录年代

    The residual generator: An improved divergence minimization framework for GAN

    Gnanha, Aurele TohokantcheCao, WenmingMao, XudongWu, Si...
    15页
    查看更多>>摘要:GAN is a generative modelling framework which has been proven as able to minimise various types of divergence measures under an optimal discriminator. However, there is a gap between the loss function of GAN used in theory and in practice. In theory, the proof of the Jensen divergence minimisation involves the min-max criterion, but in practice the non-saturating criterion is instead used to avoid gradient vanishing. We argue that the formulation of divergence minimization via GAN is biased and may yield a poor convergence of the algorithm. In this paper, we propose the Residual Generator for GAN (Rg-GAN), which is inspired by the closed-loop control theory, to bridge the gap between theory and practice. Rg-GAN minimizes the residual between the loss of the generated data to be real and the loss of the generated data to be fake from the perspective of the discriminator. In this setting, the loss terms of the generator depend only on the generated data and therefore contribute to the optimisation of the model. We formulate the residual generator for standard GAN and least-squares GAN and show that they are equivalent to the minimisation of reverse-KL divergence and a novel instance of f-divergence, respectively. Furthermore, we prove that Rg-GAN can be reduced to Integral Probability Metrics (IPMs) GANs (e.g., Wasserstein GAN) and bridge the gap between IPMs and f-divergence. Additionally, we further improve on Rg-GAN by proposing a loss function for the discriminator that has a better discrimination ability. Experiments on synthetic and natural images data sets show that Rg-GAN is robust to mode collapse, and improves the generation quality of GAN in terms of FID and IS scores. (c) 2021 Elsevier Ltd. All rights reserved.

    Blitz-SLAM: A semantic SLAM in dynamic environments

    Fan, YingchunZhang, QichiTang, YuliangLiu, Shaofen...
    14页
    查看更多>>摘要:Static environment is a prerequisite for most of visual simultaneous localization and mapping systems. Such a strong assumption limits the practical application of most existing SLAM systems. When moving objects enter the camera's view field, dynamic matching points will directly interrupt the camera localization, and the noise blocks formed by moving objects will contaminate the constructed map. In this paper, a semantic SLAM system working in indoor dynamic environments named Blitz-SLAM is proposed. The noise blocks in the local point cloud are removed by combining the advantages of semantic and geometric information of mask, RGB and depth images. The global point cloud map can be obtained by merging the local point clouds. We evaluate Blitz-SLAM on the TUM RGB-D dataset and in the real world environment. The experimental results demonstrate that Blitz-SLAM can work robustly in dynamic environments and generate a clean and accurate global point cloud map simultaneously. (c) 2021 Elsevier Ltd. All rights reserved.

    Spatiotemporal consistency-enhanced network for video anomaly detection

    Hao, YiLi, JieWang, NannanWang, Xiaoyu...
    11页
    查看更多>>摘要:Video anomaly detection aims to detect abnormal segments in a video sequence, which is a key problem in video surveillance. Based on deep prediction methods, we propose a spatiotemporal consistency enhanced network to generate spatiotemporal consistency predictions. A 3D CNN-based encoder and 2D CNN-based decoder constitute the main part of our model. A resampling strategy is applied to the latent space vector when the model is trained by the normal data, yet this can cause the model to perform poorly if the data include abnormal data. Moreover, we combine an input clip with a generated frame into a reformed video clip, which is then fed into a discriminator that is constructed by the 3D CNN to evaluate the consistency of the input clip. Owing to the adversarial training between the generator and discriminator, the spatiotemporal consistency of the generated results is enhanced. During the testing stage, the abnormal data generates a different appearance and motion changes, which affect the ability of our model to predict spatiotemporal consistency in future images. Then, the prediction quality gap between normal and anomalous contents is used to infer whether anomalies occur. Extensive experiments confirm that the proposed method achieves state-of-the-art performance on three benchmark datasets, including ShanghaiTech, CUHK Avenue, and UCSD Ped2. 0 2021 Elsevier Ltd. All rights reserved.

    Financial time series forecasting with multi-modality graph neural network

    Cheng, DaweiYang, FangzhouXiang, ShengLiu, Jin...
    10页
    查看更多>>摘要:Financial time series analysis plays a central role in hedging market risks and optimizing investment de-cisions. This is a challenging task as the problems are always accompanied by multi-modality streams and lead-lag effects. For example, the price movements of stock are reflections of complicated market states in different diffusion speeds, including historical price series, media news, associated events, etc. Furthermore, the financial industry requires forecasting models to be interpretable and compliant. There-fore, in this paper, we propose a multi-modality graph neural network (MAGNN) to learn from these multimodal inputs for financial time series prediction. The heterogeneous graph network is constructed by the sources as nodes and relations in our financial knowledge graph as edges. To ensure the model interpretability, we leverage a two-phase attention mechanism for joint optimization, allowing end-users to investigate the importance of inner-modality and inter-modality sources. Extensive experiments on real-world datasets demonstrate the superior performance of MAGNN in financial market prediction. Our method provides investors with a profitable as well as interpretable option and enables them to make informed investment decisions. (c) 2021 Elsevier Ltd. All rights reserved.

    Towards robust explanations for deep neural networks

    Dombrowski, Ann-KathrinAnders, Christopher J.Mueller, Klaus-RobertKessel, Pan...
    20页
    查看更多>>摘要:Explanation methods shed light on the decision process of black-box classifiers such as deep neural networks. But their usefulness can be compromised because they are susceptible to manipulations. With this work, we aim to enhance the resilience of explanations. We develop a unified theoretical framework for deriving bounds on the maximal manipulability of a model. Based on these theoretical insights, we present three different techniques to boost robustness against manipulation: training with weight decay, smoothing activation functions, and minimizing the Hessian of the network. Our experimental results confirm the effectiveness of these approaches. (c) 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )

    Towards open-set touchless palmprint recognition via weight-based meta metric learning

    Shao, HuikaiZhong, Dexing
    12页
    查看更多>>摘要:Touchless biometrics has become significant in the wake of novel coronavirus 2019 (COVID-19). Due to the convenience, user-friendly, and high-accuracy, touchless palmprint recognition shows great potential when the hygiene issues are considered during COVID-19. However, previous palmprint recognition meth-ods are mainly focused on close-set scenario. In this paper, a novel Weight-based Meta Metric Learning (W2ML) method is proposed for accurate open-set touchless palmprint recognition, where only a part of categories is seen during training. Deep metric learning-based feature extractor is learned in a meta way to improve the generalization ability. Multiple sets are sampled randomly to define support and query sets, which are further combined into meta sets to constrain the set-based distances. Particularly, hard sample mining and weighting are adopted to select informative meta sets to improve the efficiency. Finally, embeddings with obvious inter-class and intra-class differences are obtained as features for palm -print identification and verification. Experiments are conducted on four palmprint benchmarks including fourteen constrained and unconstrained palmprint datasets. The results show that our W2ML method is more robust and efficient in dealing with open-set palmprint recognition issue as compared to the state -of-the-arts, where the accuracy is increased by up to 9.11% and the Equal Error Rate (EER) is decreased by up to 2.97%. (c) 2021 Elsevier Ltd. All rights reserved.

    Universal multi-Source domain adaptation for image classification

    Yang, ZhenHu, HaifengWu, XiaofuYin, Yueming...
    13页
    查看更多>>摘要:Unsupervised domain adaptation (DA) enables intelligent models to learn transferable knowledge from a labeled source domain and adapt to a similar but unlabeled target domain. Studies showed that knowl-edge could be transferred from one source domain to another unknown target domain, called Universal DA (UDA). However, there is often more than one source domain in the real-world application to be ex-ploited for DA. In this paper, we formally propose a more general domain adaptation setting for image classification, universal multi-source DA (UMDA), where the label sets of multiple source domains can be different, and the label set of the target domain is completely unknown. The main challenge in UMDA is to identify the common label set among each source and target domain and keep the model scalable as the number of source domains increases. In the face of this challenge, we propose a universal multi -source adaptation network (UMAN) to solve the DA problem without increasing the complexity of the model in various UMDA settings. In UMAN, the reliability of each known class belonging to the common label set is estimated via a novel pseudo-margin vector and its weighted form, which helps adversarial training better align the distributions of multiple source domains and target domain. Moreover, the the-oretical guarantee for UMAN is also provided. Massive experimental results show that existing UDA and multi-source DA (MDA) methods cannot be directly deployed to UMDA, and the proposed UMAN achieves the state-of-the-art performance in various UMDA settings. (c) 2021 Elsevier Ltd. All rights reserved.

    Action recognition via pose-based graph convolutional networks with intermediate dense supervision

    Shi, LeiZhang, YifanCheng, JianLu, Hanqing...
    9页
    查看更多>>摘要:Pose-based action recognition has drawn considerable attention recently. Existing methods exploit the joint position to extract body-part features from the activation maps of the backbone CNN to assist human action recognition. However, there are two limitations: (1) the body-part features are independently used or simply concatenated to obtain a representation, where the prior knowledge about the structured correlations between body parts are not fully exploited; (2) the backbone CNN, from which the body-part features are extracted, is "lazy". It always contents itself with identifying patterns from the most discriminative areas of the input, which causes no information on the features extracted from other areas. This consequently hampers the performance of the followed aggregation process and makes the model easy to be misled by the training data bias. To address these problems, we encode the body-part features into a human-based spatiotemporal graph and employ a light-weight graph convolutional module to explicitly model the dependencies between body parts. Besides, we introduce a novel intermediate dense supervision to promote the backbone CNN to treat all regions equally, which is simple and effective, without extra parameters and computations. The proposed approach, namely, the pose-based graph convolutional network (PGCN), is evaluated on three popular benchmarks, where our approach significantly outperforms the state-of-the-art methods. (c) 2021 Elsevier Ltd. All rights reserved.

    Joint multi-label learning and feature extraction for temporal link prediction

    Ma, XiaokeTan, ShiyinXie, XianghuaZhong, Xiaoxiong...
    12页
    查看更多>>摘要:Networks derived from various disciplinary of sociality and nature are dynamic and incomplete, and temporal link prediction has wide applications in recommendation system and data mining system, etc. The current algorithms first obtain features by exploiting the topological or latent structure of networks, and then predict temporal links based on the obtained features. These algorithms are criticized by the separation of feature extraction and link prediction, which fails to fully characterize the dynamics of networks, resulting in undesirable performance. To overcome this problem, we propose a novel algorithm by joint multi-label learning and feature extraction (called MLjFE), where temporal link prediction and feature extraction are integrated into an overall objective function. The main advantage of MLjFE is that the features and parameter matrix for temporal link prediction are simultaneously learned during optimization procedure, which is more precise to capture dynamics of networks, improving the performance of algorithms. The experimental results on a number of artificial and real-world temporal networks demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods, showing joint learning with feature extraction and temporal link prediction is promising. (c) 2021 Elsevier Ltd. All rights reserved.

    Delving deep into spatial pooling for squeeze-and-excitation networks

    Jin, XinXie, YanpingWei, Xiu-ShenZhao, Bo-Rui...
    12页
    查看更多>>摘要:Squeeze-and-Excitation (SE) blocks have demonstrated significant accuracy gains for state-of-the-art deep architectures by re-weighting channel-wise feature responses. The SE block is an architecture unit that integrates two operations: a squeeze operation that employs global average pooling to aggregate spatial convolutional features into a channel feature, and an excitation operation that learns instance-specific channel weights from the squeezed feature to re-weight each channel. In this paper, we revisit the squeeze operation in SE blocks, and shed lights on why and how to embed rich (both global and lo -cal ) information into the excitation module at minimal extra costs. In particular, we introduce a simple but effective two-stage spatial pooling process: rich descriptor extraction and information fusion . The rich descriptor extraction step aims to obtain a set of diverse (i.e., global and especially local) deep descrip-tors that contain more informative cues than global average-pooling. While, absorbing more information delivered by these descriptors via a fusion step can aid the excitation operation to return more accu-rate re-weight scores in a data-driven manner. We validate the effectiveness of our method by extensive experiments on ImageNet for image classification and on MS-COCO for object detection and instance seg-mentation. For these experiments, our method achieves consistent improvements over the SENets on all tasks, in some cases, by a large margin. (c) 2021 Published by Elsevier Ltd.