查看更多>>摘要:Joint image denoising algorithms use the structures of the guidance image as a prior to restore the noisy target image. While the provided guidance images are helpful to improve the denoising performance, the denoised edges are most likely to be blurred especially when the edges of the guidance image are weak or inexistent. To address this weakness, this paper proposes a new gradient-direction-based joint image denoising method in which the absolute cosine value of the angle between two gradient vectors of the guidance image and those of the image to recover is employed as the parallel measurement to ensure that the gradient directions of the denoised image are approximately the same as or opposite to those of the guidance image. Besides, a new edge-preserving regularization term is developed to alleviate the effects of the unreliable prior information from guidance image. To simplify the resultant complex non-convex and nonlinear fractional model, the logarithm function is employed to convert the multiplication operation into addition operation. Then, we construct the surrogate function for the logarithmic term of l(2)-norm, and separate the variables to transform the objective function into convex one with high numerical stability while retaining high efficiency. Finally, the optimal solutions can be obtained by directly minimizing the convex functions. Experimental results on public datasets and from nine benchmark methods consistently demonstrate the effectiveness of the proposed method both visually and quantitatively. (C) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:The visual dialog task requires an AI agent to interact with humans in multi-round dialogs based on a visual environment. As a common linguistic phenomenon, pronouns are often used in dialogs to improve the communication efficiency. As a result, resolving pronouns (i.e., grounding pronouns to the noun phrases they refer to) is an essential step towards understanding dialogs. In this paper, we propose VDPCR, a novel framework to improve Visual Dialog understanding with Pronoun Coreference Resolution in both implicit and explicit ways. First, to implicitly help models understand pronouns, we design novel methods to perform the joint training of the pronoun coreference resolution and visual dialog tasks. Second, after observing that the coreference relationship of pronouns and their referents indicates the relevance between dialog rounds, we propose to explicitly prune the irrelevant history rounds in visual dialog models' input. With pruned input, the models can focus on relevant dialog history and ignore the distraction in the irrelevant one. With the proposed implicit and explicit methods, VD-PCR achieves state-of-the-art experimental results on the VisDial dataset. (c) 2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Gait can be used to recognize people in an uncooperative and noninvasive manner and it is hard to imi-tate or counterfeit, which makes it suitable for video surveillance. The current solutions for gait recogni-tion are still not robust to handle the conditions when the view angles of the gallery and query are differ-ent. We improve the performance of cross-view gait recognition from the perspective of metric learning. Specifically, we propose to use angular softmax loss to impose an angular margin for extracting separa-ble features. At the same time, we use triplet loss to make the extracted features more discriminative. Additionally, we add a batch-normalization layer after extracting gait features to effectively optimize two different losses. We evaluate our approach on two widely-used gait dataset: CASIA-B dataset and TUM GAID dataset. The experiment results show that our approach outperforms the prior state-of-the-art ap-proaches, which shows the effectiveness of our approach. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Feature matching is used to build correspondences between features in the model and test images. As the extension of graph matching, hypergraph matching is able to encode rich invariance between feature tuples and improve matching accuracy. Different from many existing algorithms based on maximizing the matching score between correspondences, our approach formulates hypergraph matching as a non-cooperative multi-player game and obtains matches by extracting the evolutionary stable strategies (ESS). While this approach generates a high matching accuracy, the number of matches is usually small and it involves a large computation load to obtain more matches. To solve this problem, we extract multiple ESS clusters instead of one single ESS group, thereby transforming hypergraph matching of features to hy-pergraph clustering of candidate matches. By extracting an appropriate number of clusters, we increase the number of matches efficiently, and improve the matching accuracy by imposing the one-to-one con-straint. In experiments with three real datasets, our algorithm is shown to generate a large number of matches efficiently. It also shows significant advantage in matching accuracy in comparison with some other hypergraph matching algorithms. (c) 2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Gait recognition, as an attractive task in biometrics, remains challenging due to significant intra-class changes of clothing and pose variations across different cameras. Recent approaches mainly focus on silhouette-based gait mode, which is easy to model in Convolutional Neural Networks (CNNs). Compared with silhouettes, the dynamics of skeletons essentially convey more robust information, which is invariant to view and clothing changes. Conventional approaches for modeling skeletons usually rely on handcrafted features or traversal rules, thus resulting in limited expressive power and difficulties of generalization. In this work, we address the skeleton-based gait recognition task with a novel Symmetry-Driven Hyper Feature Graph Convolutional Network (SDHF-GCN), which goes beyond the limitations of previous approaches by automatically learning multiple dynamic patterns and hierarchical semantic features in a unified Graph Convolutional Network (GCN). This model involves three dynamic patterns: natural connection, temporal correlation and symmetric interaction, which enriches the description of dynamic patterns by exploiting symmetry perceptual principles. Furthermore, a hyper feature network is proposed to aggregate the hierarchical semantic features, including dynamic features at the high level, structured features at the intermediate level, and static features at the low level, which complement each other to enhance the discriminative ability. By integrating different patterns in the hierarchical structure, the model is able to generate versatile and discriminative representations, thus improving the recognition rate. On the CASIA-B and OUMVLP-Pose datasets, the proposed SDHF-GCN renders substantial improvements over mainstream methods, especially in the coat-wearing scenario, with superior robustness to covariate factors. (c) 2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:3D point cloud reconstruction is an urgent task in computer vision for environment perception. Nevertheless, the reconstructed scene is inaccurate and incomplete, because the visibility of pixels is not taken into account by existing methods. In this paper, a cascaded network with a multiple cost volume aggregation module named ADR-MVSNet is proposed. Three improvements are presented in ADR-MVSNet. First, to improve the reconstruction accuracy and reduce the time complexity, an adaptive depth reduction module, which adaptively adjusts the depth range of the pixel through the confidence interval, is proposed. Second, to more accurately estimate the depth of occluded pixels in multiview images, a multiple cost volume aggregation module, in which Gini impurity is introduced to measure the confidence of pixel depth prediction, is proposed. Third, a multiscale photometric consistency filter module is proposed, which considers the information in multiple confidence maps at the same time and filters out outliers accurately to remove pixels with low confidence. Therefore, the accuracy of point cloud reconstruction is improved. The experimental results on the DTU and Tanks and Temple datasets demonstrate that ADR-MVSNet achieves highly accurate and highly complete reconstruction compared with state-ofthe-art benchmarks. (c) 2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Recovering an unknown object from the magnitude of its Fourier transform is a phase retrieval problem. Here, we consider a much difficult case, where those observed intensity values are incomplete and contaminated by both salt-and-pepper and random-valued impulse noise. To take advantage of the low-rank property within the image of the object, we use a regularization term which penalizes high weighted nuclear norm values of image patch groups. For outliers (impulse noise) in the observation, the l(1-2) metric is adopted as the data fidelity term. Then we break down the resulting optimization problem into smaller ones, for example, weighted nuclear norm proximal mapping and l(1-2) minimization, because the nonconvex and nonsmooth subproblems have available closed-form solutions. The convergence results are also presented, and numerical experiments are provided to demonstrate the superior reconstruction quality of the proposed method. (C) 2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:With the popularity of electronic touch-screen and pressure sensing devices, fine-grained sketch based image retrieval (FG-SBIR) has become a research hotspot. In this paper, we stress the core problems of FG-SBIR: a. how to reduce the difference between the non-homogenous of heterogeneous media, and b. how to improve the distinguishability of sketch features. Specifically, a sketch generation model is first proposed to replace the conventional pre-processing of roughly extracting image edges, moreover, this model can alleviate the dilemma of sketch data scarcity. We then construct a novel FG-SBIR model which takes advantage of deformable convolutional neural network while taking into consideration of semantic attributes together. In addition, we build a fine-grained clothing sketch-image dataset, which has rich attribute annotations, for the first time. Extensive experiments exhibit that our proposed model achieves a better performance in improving the retrieval accuracy over the state-of-the-art baselines. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:This work addresses the problem of adversarial robustness in deep neural network classification from an optimal class boundary estimation perspective. It is argued that increased model robustness to adversarial attacks can be achieved when the feature learning process is monitored by geometrically-inspired opti-mization criteria. To this end, we propose to learn hyperspherical class prototypes in the neural feature embedding space, along with training the network parameters. Three concurrent optimization functions for the intermediate hidden layer training data activations are devised, requiring items of the same class to be enclosed by the corresponding class prototype boundaries, to have minimum distance from their class prototype vector (i.e., hypersphere center) and to have maximum distance from the remainder hy-persphere centers. Our experiments show that training standard classification model architectures with the proposed objectives, significantly increases their robustness to white-box adversarial attacks, without adverse (if not beneficial) effects to their classification accuracy. (c) 2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Graph-based clustering has been considered as an effective kind of method in unsupervised manner to partition various items into several groups, such as Spectral Clustering (SC). However, there are three species of drawbacks in SC: (1) The effects of clustering is sensitive to the affinity matrix that is fixed by original data. (2) The input affinity matrix is simply based on distance measurement, which lacks of clear physical meaning under probabilistic prediction. (3) Additional discretization procedures still need to be operated. To cope with these issues, we propose a new clustering model, which refers to Entropy Regularization for unsupervised Clustering with Adaptive Neighbors (ERCAN), to dynamically and simultaneously update affinity matrix and clustering results. Firstly, the maximized entropy regularization term is introduced in probability model to avoid trivial similarity distributions. Additionally, we newly introduce the Laplacian rank constraint with l(0)-norm to construct adaptive neighbors for sparsity and strength segmentation ability without extra discretization process. Finally, we present a novel monotonic function optimization method, which reveals the consistence between graph sparsity and neighbor assignment, to address the l(0)-norm constraint in alternative optimization process. Comprehensive experiments show the superiority of our method with promising results. (C) 2022 Elsevier Ltd. All rights reserved.