查看更多>>摘要:The recent advances in Big Data have opened up the opportunity to develop competitive Global Forecasting Models (GFM) that simultaneously learn from many time series. Although, the concept of series relatedness has been heavily exploited with GFMs to explain their superiority over local statistical benchmarks, this concept remains largely under-investigated in an empirical setting. Hence, this study attempts to explore the factors that affect GFM performance, by simulating a number of datasets having controllable characteristics. The factors being controlled are along the homogeneity/heterogeneity of series, the complexity of patterns in the series, the complexity of forecasting models, and the lengths/number of series. We simulate time series from simple Data Generating Processes (DGP), such as Auto Regressive (AR), Seasonal AR and Fourier Terms to complex DGPs, such as Chaotic Logistic Map, Self-Exciting Threshold Auto-Regressive and Mackey-Glass Equations. We perform experiments on these datasets using Recurrent Neural Networks (RNN), Feed-Forward Neural Networks, Pooled Regression models and Light Gradient Boosting Models (LGBM) built as GFMs, and compare their performance against standard statistical forecasting techniques. Our experiments demonstrate that with respect to GFM performance, relatedness is closely associated with other factors such as the availability of data, complexity of data and the complexity of the forecasting technique used. Also, techniques such as RNNs and LGBMs having complex non-linear modelling capabilities, when built as GFMs are competitive methods under challenging forecasting scenarios such as short series, heterogeneous series and having minimal prior knowledge of the data patterns.(c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Instance segmentation is one of the most challenging tasks in computer vision, which requires separating each instance in pixels. To date, a low-resolution binary mask is the dominant paradigm for representation of instance mask. For example, the size of the predicted mask in Mask R-CNN is usually 28 x 28 . Generally, a low-resolution mask can not capture the object details well, while a high-resolution mask dramatically increases the training complexity. In this work, we propose a flexible and effective approach to encode the high-resolution structured mask to the compact representation which shares the advantages of high-quality and low-complexity. The proposed mask representation can be easily integrated into two-stage pipelines such as Mask R-CNN, improving mask AP by 0.9% on the COCO dataset, 1.4% on the LVIS dataset, and 2.1% on the Cityscapes dataset. Moreover, a novel single shot instance segmentation framework can be constructed by extending the existing one-stage detector with a mask branch for this instance representation. Our model shows its superiority over the explicit contour-based pipelines in accuracy with similar computational complexity. We also evaluate our method for video instance segmentation, achieving promising results on YouTube-VIS dataset. Code is available at: https://git.io/AdelaiDet (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Because of the tumor with infiltrative growth, the glioma boundary is usually fused with the brain tissue, which leads to the failure of accurately segmenting the brain tumor structure through single-modal images. The multi-modal ones are relatively complemented to the inherent heterogeneity and external boundary, which provide complementary features and outlines. Besides, it can retain the structural characteristics of brain diseases from multi angles. However, due to the particularity of multi-modal medical image sampling that increases uneven data density and dense structural vascular tumor mitosis, the glioma may have atypical boundary fuzzy and more noise. To solve this problem, in this paper, the dual path network based on multi-modal feature fusion (MFF-DNet) is proposed. Firstly, the proposed network uses different kernels multiplexing methods to realize the combination of the large-scale perceptual domain and the non-linear mapping features, which effectively enhances the coherence of information flow. Then, the over-lapping frequency and the vanishing gradient phenomenon are reduced by the residual connection and the dense connection, which alleviate the mutual influence of multi-modal channels. Finally, a dual-path model based on the DenseNet network and the feature pyramid networks (FPN) is established to realize the fusion of low-level, middle-level, and high-level features. Besides, it increases the diversification of glioma non-linear structural features and improves the segmentation precision. A large number of ablation experiments show the effectiveness of the proposed model. The precision of the whole brain tumor and the core tumor can reach 0.92 and 0.90, respectively.(c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Multi-view clustering attracts more and more attention due to the fact that it can utilize the complementary and compatible information from multi-view data sets. In many graph-based multi-view clustering approaches, the graph quality is important since it influences the following clustering performance. Therefore, learning a high quality similarity graph is desired. In this paper, we propose a novel clustering method which is named as Self-weighting Multi-view Spectral Clustering based on Nuclear Norm (SMSC_NN). Specifically, to fully utilize the multiple view features, the common consensus representation is learned. Moreover, to capture the principal components from various view features, the nuclear norm is introduced which can make the view-specific information be well explored. Further, due to the fact that each view feature denotes a sort of specific property, the adaptive weights are assigned instead of equal view weights. In order to verify the effectiveness of the proposed method, four multi-view data sets are used to conduct the clustering experiments. Extensive experimental results demonstrate the superiority of the proposed method comparing with state-of-the-art multi-view clustering approaches. In addition, the proposed approach is experimented on the Cal101-20 data set with "salt and pepper" noises, and experimental results verify that the proposed SMSC_NN method can remain robust to noises. (c) 2021 Published by Elsevier Ltd.
查看更多>>摘要:Adversarial examples have been shown to be a severe threat to deep neural networks (DNNs). One of the most effective adversarial defense methods is adversarial training (AT) through minimizing the adversarial risk R adv , which encourages both the benign example x and its adversarially perturbed neighborhoods within the e p -ball to be predicted as the ground-truth label. In this paper, we propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ( i.e. , R stand and R rob ), which are with respect to the benign example and its neighborhoods, respectively. The motivation is to explicitly and jointly enhance the accuracy and the adversarial robustness. We prove that R adv is upper-bounded by R stand + R rob , which implies that RT has similar effect as AT. Intuitively, minimizing the standard risk enforces the benign example to be correctly predicted, while the robust risk minimization encourages the predictions of the neighbor examples to be consistent with the prediction of the benign example. Besides, since R rob is independent of the ground-truth label, RT is naturally extended to the semi-supervised mode ( i.e. , SRT), to further enhance its effectiveness. Moreover, we extend the e p -bounded neighborhood to a general case, which covers different types of perturbations, such as the pixel-wise ( i.e. , x + delta) or the spatial perturbation ( i.e. , Ax + b). Extensive experiments on benchmark datasets not only verify the superiority of the proposed SRT to state-of-the-art methods for defending pixel-wise or spatial perturbations separately but also demonstrate its robustness to both perturbations simultaneously. Our work may shed the light on the understanding of universal model robustness and the potential of unlabeled samples. The code for reproducing main results is available at https://github.com/THUYimingLi/Semi-supervised _ Robust _ Training . (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Pose guided person image generation means to generate a photo-realistic person image conditioned on an input person image and a desired pose. This task requires spatial manipulation of the source image according to the target pose. However, convolutional neural networks (CNNs) are inherently limited to geometric transformations due to the fixed geometric structures in their building modules, i.e., convolution, pooling and unpooling, which cannot handle large motion and occlusions caused by large pose transform. This paper introduces a novel two-stream context-aware appearance transfer network to address these challenges. It is a three-stage architecture consisting of a source stream and a target stream. Each stage features an appearance transfer module, a multi-scale context module and two-stream feature fusion modules. The appearance transfer module handles large motion by finding the dense correspondence between the two-stream feature maps and then transferring the appearance information from the source stream to the target stream. The multi-scale context module handles occlusion via contextual modeling, which is achieved by atrous convolutions of different sampling rates. Both quantitative and qualitative results indicate the proposed network can effectively handle challenging cases of large pose transform while retaining the appearance details. Compared with state-of-the-art approaches, it achieves comparable or superior performance using much fewer parameters while being significantly faster. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Assessing if an image comes from a specific device is fundamental in many application scenarios. The most promising techniques to solve this problem rely on the Photo Response Non Uniformity (PRNU), a unique trace left during image acquisition. A PRNU fingerprint is computed from several images of a given device, then it is compared with the probe residual noise by means of correlation. However, such a comparison requires that PRNUs are synchronized: even small image transformations can spoil this task. Most of the attempts to solve the registration problem rely on time consuming brute-force search, which is prone to missing detections and false positives. In this paper, the problem is addressed from a computer vision perspective, exploiting recent image registration techniques based on deep learning, and focusing on scaling and rotation transformations. Experiments show that the proposed method is both more accurate and faster than state-of-the-art approaches. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Recent advances in Deep Reinforcement Learning (DRL) demonstrates the potential for solving Combinatorial Optimization (CO) problems. DRL shows advantages over traditional methods both on scalability and computation efficiency. However, the DRL problems transformed from CO problems usually have a huge state space, and the main challenge of solving them has changed from high computation complexity to high sample complexity. Credit assignment determines the contribution of each internal decision to the final success or failure, and it has been shown to be effective in reducing the sample complexity of the training process. In this paper, we resort to a model-based reinforcement learning method to assign credits for model-free DRL methods. Since heuristic methods plays an important role on state-of-the-art solutions for CO problems, we propose using a model to represent those heuristic knowledge and derive the credit assignment from the model. This model-based credit assignment can facilitate the model-free DRL to perform a more effective exploration, and the data collected by the model-free DRL refines the model continuously as the training progresses. Extensive experiments on various CO problems with different settings show that our framework outperforms previous state-of-the-art methods on performance and training efficiency. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Siamese networks have achieved great success in visual tracking with the advantages of speed and accuracy. However, how to track an object precisely and robustly still remains challenging. One reason is that multiple types of features are required to achieve good precision and robustness, which are unattainable by a single training phase. Moreover, Siamese networks usually struggle with online adaption problem. In this paper, we present a novel two-stage aware attentional Siamese network for tracking (Ta-ASiam). Concretely, we first propose a position-aware and an appearance-aware training strategy to optimize different layers of Siamese network. By introducing diverse training patterns, two types of required features can be captured simultaneously. Then, following the rule of feature distribution, an effective feature selection module is constructed by combining both channel and spatial attention networks to adapt to rapid appearance changes of the object. Extensive experiments on various latest benchmarks have well demonstrated the effectiveness of our method, which significantly outperforms state-of-the-art trackers. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Incorporating the depth (D) information to RGB images has proven the effectiveness and robustness in semantic segmentation. However, the fusion between them is not trivial due to their inherent physical meaning discrepancy, in which RGB represents RGB information but D depth information. In this paper, we propose a co-attention network (CANet) to build sound interaction between RGB and depth features. The key part in the CANet is the co-attention fusion part. It includes three modules. Specifically, the po-sition and channel co-attention fusion modules adaptively fuse RGB and depth features in spatial and channel dimensions. An additional fusion co-attention module further integrates the outputs of the posi-tion and channel co-attention fusion modules to obtain a more representative feature which is used for the final semantic segmentation. Extensive experiments witness the effectiveness of the CANet in fus-ing RGB and depth features, achieving state-of-the-art performance on two challenging RGB-D semantic segmentation datasets, i.e., NYUDv2 and SUN-RGBD. (c) 2021 Elsevier Ltd. All rights reserved.