查看更多>>摘要:Supervised subspace projection technology is a major method for dimensionality reduction in pattern recognition. At present, most supervised subspace projection algorithms are derived from the multi-dimensional extended version of Fisher linear discriminant analysis (FDA), also known as Multidimensional Fisher discriminant analysis (MD-FDA). However, MD-FDA needs to be improved further because the projection vectors in the noise-subspace cannot be sorted and the ill-condition of the within class scatter matrix may cause severe numerical instabilities. Generalized discriminant component analysis (GDCA), the generalization of MD-FDA, together with its kernelization forms are proposed and correspondingly rigorous mathematical proofs are detailed in this paper. By virtue of 5 validation data sets derived from UCI Machine Learning Repository and our laboratory, the theoretical validity and technical advantages of GDCA as well as its kernelization forms are verified, and the effectiveness of the newly proposed method is demonstrated in comparison with 36 kinds of state-of-the-art dimensionality reduction algorithms. (c) 2021 Elsevier Ltd. All rights reserved.
Park, ChiwooBorth, David J.Wilson, Nicholas S.Hunter, Chad N....
14页
查看更多>>摘要:This paper presents a new approach to a robust Gaussian process regression, creating a non-parametric Bayesian regression estimate robust to outliers. Most existing approaches replace an outlier-prone Gaussian likelihood with a non-Gaussian likelihood induced from a heavy tail distribution, such as the Laplace distribution and Student-t distribution. However, the use of a non-Gaussian likelihood would incur the need for a computationally expensive Bayesian approximate computation in the posterior inferences. The proposed approach models an outlier as a noisy and biased observation of an unknown regression function, and accordingly, the likelihood contains bias terms to explain the degree of deviations from the regression function. We introduce two bias models that handle the bias terms differently, treating a bias as an unknown and fixed quantity or treating a bias as a random quantity. We entail how the biases can be estimated accurately with other hyperparameters by a regularized maximum likelihood estimation. Conditioned on the bias estimates, the robust GP regression can be reduced to a standard GP regression problem with analytical forms of the predictive mean and variance estimates. Therefore, the proposed approach is simple and very computationally attractive. It also gives a very robust and accurate GP estimate for many tested scenarios. For the numerical evaluation, we perform a comprehensive simulation study to evaluate the proposed approach with the comparison to the existing robust GP approaches under various simulated scenarios of different outlier proportions and different noise levels. The approach is applied to data from two measurement systems, where the predictors are based on robust environmental parameter measurements and the response variables utilize more complex chemical sensing methods that contain a certain percentage of outliers. The utility of the measurement systems and value of the environmental data are improved through the computationally efficient GP regression and bias model.(c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:To achieve high-resolution segmentation results, typical semantic segmentation models often require high-resolution inputs. However, high-resolution inputs inevitably bring high cost on computation, which limits its application seriously in realistic scenarios. To address the problem, we propose to predict a high-resolution semantic segmentation result with a degraded low-resolution image as input, which is called super-resolution semantic segmentation in this paper. We further propose a Relation Calibrating Network (RCNet) for this task. Specifically, we propose two modules, namely Relation Upsampling Module (RUM) and Feature Calibrating Module (FCM). In RUM, the input feature map generates the relation map of pixels in low-resolution, which is then gradually upsampled to high-resolution. Meanwhile, FCM takes the input feature map and the relation map from RUM as inputs, gradually calibrating the feature. Finally, the last FCM outputs the high-resolution segmentation results. We conduct extensive experiments to verify the effectiveness of our method. Specially, we achieve a comparable segmentation result (from 70.01% to 70.90%) with only 1/4 of the computational cost (from 1107.57 to 255.72 GFLOPs) based on FCN on Cityscapes dataset. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Increasing object detectors reveal the importance of feature representation in improving detection per-formance. Currently, feature enhancement mainly focuses on Feature Pyramid Network (FPN) as well as Region-of-Interest (RoI) feature fusion in two-stage object detectors. Based on this, we propose Adaptive Region-aware Feature Enhancement method including Adaptive Region-aware FPN (AR-FPN) and Adaptive Region-aware RoI Feature Fusion (AR-RFF) modules. Specifically, AR-FPN aims to capture position-sensitive map for each level to enhance the pixel-wise interest degree and make the differences among levels more distinctive. AR-RFF focuses on obtaining distinguishable RoI features by introducing adaptive region information and eliminating scale inconsistency between the refined and original features. Extensive ex-periments show that our method acquires 1.7% AP higher at least and strong generalization capability compared to others. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Hierarchical classification is significant for big data, where the original task is divided into several sub-tasks to provide multi-granularity predictions based on a tree-shape label structure. Obviously, these sub-tasks are highly correlated: results of the coarser-grained sub-tasks can reduce the candidates for the fine-grained sub-tasks, while results of the fine-grained sub-tasks provide attributes describing the coarser-grained classes. A human can integrate feedbacks from all the related sub-tasks instead of con-sidering each sub-task independently. Therefore, we propose a deep collaborative multi-task network for hierarchical image classification. Specifically, we first extract the relationship matrix between every two sub-tasks defined by the hierarchical label structure. Then, the information of each sub-task is broad-casted to all the related sub-tasks through the relationship matrix. Finally, to combine this information, a novel fusion function based on the task evaluation and the decision uncertainty is designed. Extensive experimental results demonstrate that our model can achieve state-of-the-art performance. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Detection and recognition of scene texts of arbitrary shapes remain a grand challenge due to the super-rich text shape variation in text line orientations, lengths, curvatures, etc. This paper presents a mask-guided multi-task network that detects and rectifies scene texts of arbitrary shapes reliably. Three types of keypoints are detected which specify the centre line and so the shape of text instances accurately. In addition, four types of keypoint links are detected of which the horizontal links associate the detected keypoints of each text instance and the vertical links predict a pair of landmark points (for each key-point) along the upper and lower text boundary, respectively. Scene texts can be located and rectified by linking up the associated landmark points (giving localization polygon boxes) and transforming the polygon boxes via thin plate spline, respectively. Extensive experiments over several public datasets show that the use of text keypoints is tolerant to the variation in text orientations, lengths, and curvatures, and it achieves competitive scene text detection and rectification performance as compared with state-of-the -art methods. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:The existing domain adaptive object detection methods often need to carry a large number of source domain samples for domain adaptation, which is not realistic due to GPU limitations, privacy and physical memory in practical applications. To solve this problem, we propose a source data-free domain adaptive object detection method. Only unlabeled target domain data is used to optimize the source domain model so that it can work better in the target domain. Our method takes Faster R-CNN as baseline. Specifically, we first construct global class prototypes which will be updated in batch iteratively. Then based on the global class prototypes, more accurate pseudo-labels are generated for training the target model. In this way, the source and target domains are also implicitly aligned. Our contributions are 1) a prototype guided domain adaptation method which uses prototypes to mine the semantic category information without accessing the source dataset; 2) a scheme of iteratively updating global class prototype which can handle the class and sample imbalances in the training procedure and 3) a more accurate pseudo-label generation method combining semantic information and image information. On multiple public domain adaptive scenarios, our method achieves the state-of-the-art results in terms of accuracy compared with the Faster R-CNN model and some domain adaptive methods with source datasets. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Feature matching, which refers to determining reliable correspondences between two sets of feature points, is a fundamental component of numerous visual tasks. This paper proposes a novel method, termed as guided neighborhood affine subspace embedding (NASE), to eliminate false matches from the given tentative feature matches. Its essential philosophy is to preserve the underlying intrinsic manifold of potential true matches. Specifically, we aim to approximate the manifold of an inlier with an affine subspace fitted on its neighbors by imposing a motion-consistency constraint. Considering that the "corresponding manifold" of inliers may be biased by gross outliers, we introduce a density-based seed point selection strategy for neighborhood refinement. Based on the above two strategies, we further formulate the general feature matching problem into a mathematical optimization model and deduce a closed-form solution with linearithmic time complexity (i.e., O(NlogN)) for mismatch removal. Additionally, we devise a multi-scale strategy for neighborhood construction, making our method more robust to various degradations. Extensive experiments on general feature matching, fundamental matrix estimation, and loop closure detection demonstrate the clear superiority of NASE over the state-of-the-arts. (C) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Clustering ensemble, or consensus clustering, has emerged as a powerful tool for improving both the robustness and the stability of results from individual clustering methods. Weighted clustering ensemble arises naturally from clustering ensemble. One of the arguments for weighted clustering ensemble is that elements (clusterings or clusters) in a clustering ensemble are of different quality, or that objects or features are of varying significance. However, it is not possible to directly apply the weighting mechanisms from classification (supervised) domain to clustering (unsupervised) domain, also because clustering is inherently an ill-posed problem. This paper provides an overview of weighted clustering ensemble by discussing different types of weights, major approaches to determining weight values, and applications of weighted clustering ensemble to complex data. The unifying framework presented in this paper will help clustering practitioners select the most appropriate weighting mechanisms for their own problems. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Forecasting human motion from a sequence of human poses is an important problem in the fields of computer vision and robotics. Most previous approaches merely consider learning the temporal dynamics of body joints or joint angles, while neglect derivatives of body joints (i.e., pose velocities) which could reasonably reduce noise impact and improve stability. To exploit the benefits of pose velocities, we propose the velocity-to-velocity learning paradigm for human motion prediction which attempts to directly build the sequence-to-sequence model in the velocity space. Two variant architectures based on recurrent encoder-decoder networks are introduced under this paradigm. Considering human motion as kinematics of rigid bodies, joint angles which denote transformation are the computations of inverse kinematics. Accordingly, a novel loss function in terms of rotation matrices is designed during training for human motion prediction through a rotation matrix transformation (RMT) layer. Finally, we present an effective training algorithm which exploits sequence transformation to improve model generalization. Our approaches substantially outperform state-of-the-art approaches on two large-scale datasets, Human3.6M and CMU Motion Capture, for both short-term prediction and long-term prediction. In particular, our model can competently forecast human-like and meaningful poses up to 10 0 0 milliseconds. The code is available on GitHub: https://github.com/hongsong-wang/RNN _ based _ human _ motion _ prediction .(c) 2021 Elsevier Ltd. All rights reserved.