查看更多>>摘要:Human-Object Interaction (HOI) detection aims to detect visual relations between humans and objects in images. One significant problem of HOI detection is that non-interactive human-object pair can be easily mis-grouped and misclassified as an action, especially when the humans are close and performing similar actions in the scene. To address the mis-grouping problem, we propose a spatial enhancement approach to enforce fine-level spatial constraints in two directions between human body parts and object parts. At inference, we propose a human-object regrouping approach for object-exclusive actions by considering the object-exclusive property of the interactive object, where the target object should not be shared by more than one human. By suppressing non-interactive pairs, our approach can decrease the false positives. Experiments on V-COCO and HICO-DET datasets demonstrate our approach is more robust compared to the existing methods under the presence of multiple humans and objects in the scene. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:This paper considers to jointly tackle the highly correlated tasks of estimating 3D human body poses and predicting future 3D motions from RGB image sequences. Based on Lie algebra pose representation, a novel self-projection mechanism is proposed that naturally preserves human motion kinematics. This is further facilitated by a sequence-to-sequence multi-task architecture based on an encoder-decoder topol-ogy, which enables us to tap into the common ground shared by both tasks. Finally, a global refinement module is proposed to boost the performance of our framework. The effectiveness of our approach, called PoseMoNet, is demonstrated by ablation tests and empirical evaluations on Human3.6M and HumanEva-I benchmark, where competitive performance is obtained comparing to the state-of-the-arts.(c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Capsule Network (CapsNet) achieves great improvements in recognizing pose and deformation through a novel encoding mode. However, it carries a large number of parameters, leading to the challenge of heavy memory and computational cost. To solve this problem, we propose sparse CapsNet with an explicit reg-ularizer in this paper. To our knowledge, it's the first work that utilizes sparse optimization to com-press CapsNet. Specifically, to reduce unnecessary weight parameters, we first introduce the component-wise absolute value regularizer into the objective function of CapsNet based on zero-means Laplacian prior. Then, to reduce the computational cost and speed up CapsNet, the weight parameters are fur-ther grouped by 2D filters and sparsified by 1-norm regularization. To train our model efficiently, a new stochastic proximal gradient algorithm, which has analytical solutions at each iteration, is presented. Ex-tensive numerical experiments on four commonly used datasets validate the effectiveness and efficiency of the proposed method. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Domain adaptation aims at transferring knowledge from labeled source domain to unlabeled target do -main. Current advances primarily concern single source domain and neglect the setting of multiple source domains. Previous unsupervised multi-source domain adaptation (MDA) algorithms only consider domain-level alignment, while neglecting the category-level information among multiple domains and the instance variations inside each domain. This paper introduces a Two-Way alignment framework for MDA (TWMDA), which considers both domain-level and category-level alignments, and addresses the in-stance variations. We first align the target and multiple sources on the domain-level by an adversarial learning process. To circumvent the drawbacks of adversarial learning, we further reduce the domain gap on the category-level by minimizing the distance between the category prototypes and unlabeled tar -get instances. To address the instance variations, we design an instance weighting strategy for diverse source instances. The effectiveness of TWMDA is demonstrated on three benchmark datasets for image classification. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Breast cancer is one of the most common forms of cancer among women worldwide. The development of computer-aided diagnosis (CAD) technology based on ultrasound imaging to promote the diagnosis of breast lesions has attracted the attention of researchers and deep learning is a popular and effective method. However, most of the deep learning based CAD methods neglect the relationship between two vision tasks tumor region segmentation and classification. In this paper, taking into account some prior knowledges of medicine, we propose a novel segmentation-to-classification scheme by adding the segmentation-based attention (SBA) information to the deep convolution network (DCNN) for breast tumors classification. A segmentation network is trained to generate tumor segmentation enhancement images. Then two parallel networks extract features for the original images and segmentation enhanced images and one channel attention based feature aggregation network is to automatically integrate the features extracted from two feature networks to improve the performance of recognizing malignant tumors in the breast ultrasound images. To validate our method, experiments have been conducted on breast ultrasound datasets. The classification results of our method have been compared with those obtained by eleven existing approaches. The experimental results show that the proposed method achieves the highest Accuracy (90.78%), Sensitivity (91.18%), Specificity (90.44%), F1-score (91.46%), and AUC (0.9549).(c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Domain knowledge of face shapes and structures plays an important role in face inpainting. However, general inpainting methods focus mainly on the resolution of generated images without considering the particular structure of human faces and generally produce inharmonious facial parts. Existing faceinpainting methods incorporate only one type of facial feature for face completion, and their results are still undesirable. To improve face inpainting quality, we propose a Domain Embedded Generative Adversarial Network (DE-GAN) for face inpainting. DE-GAN embeds three types of face domain knowledge (i.e., face mask, face part, and landmark image) via a hierarchical variational auto-encoder (HVAE) into a latent variable space to guide face completion. Two adversarial discriminators, a global discriminator and a patch discriminator, are used to judge whether the generated distribution is close to the real distribution or not. Experiments on two public face datasets demonstrate that our proposed method generates higher quality inpainting results with consistent and harmonious facial structures and appearance than existing methods and achieves the state-of-the-art performance, esp. for inpainting under-pose variations.(c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Most existing person re-identification (re-id) methods assume supervised model training on a separate large set of training samples from the target domain. While performing well in the training domain, such trained models are seldom generalisable to a new independent unsupervised target domain without further labelled training data from the target domain. To solve this scalability limitation, we develop a novel Hierarchical Unsupervised Domain Adaptation (HUDA) method. It can transfer labelled information of an existing dataset (a source domain) to an unlabelled target domain for unsupervised person re-id. Specifically, HUDA is designed to model jointly global distribution alignment and local instance alignment in a two-level hierarchy for discovering transferable source knowledge in unsupervised domain adaptation. Crucially, this approach aims to overcome the under-constrained learning problem of existing unsupervised domain adaptation methods. Extensive evaluations show the superiority of HUDA for unsupervised cross-domain person re-id over a wide variety of state-of-the-art methods on four re-id benchmarks: Market-1501, DukeMTMC, MSMT17 and CUHK03. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Using the face as a biometric identity trait is motivated by the contactless nature of the capture process and the high accuracy of the recognition algorithms. After the current COVID-19 pandemic, wearing a face mask has been imposed in public places to keep the pandemic under control. However, face occlusion due to wearing a mask presents an emerging challenge for face recognition systems. In this paper, we present a solution to improve masked face recognition performance. Specifically, we propose the Embedding Unmasking Model (EUM) operated on top of existing face recognition models. We also propose a novel loss function, the Self-restrained Triplet (SRT), which enabled the EUM to produce embeddings similar to these of unmasked faces of the same identities. The achieved evaluation results on three face recognition models, two real masked datasets, and two synthetically generated masked face datasets proved that our proposed approach significantly improves the performance in most experimental settings. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:In this work we present a framework of designing iterative techniques for image deblurring in inverse problem. The new framework is based on two observations about existing methods. We used Landweber method as the basis to develop and present the new framework but note that the framework is applicable to other iterative techniques. First, we observed that the iterative steps of Landweber method consist of a constant term, which is a low-pass filtered version of the already blurry observation. We proposed a modification to use the observed image directly. Second, we observed that Landweber method uses an estimate of the true image as the starting point. This estimate, however, does not get updated over iterations. We proposed a modification that updates this estimate as the iterative process progresses. We integrated the two modifications into one framework of iteratively deblurring images. Finally, we tested the new method and compared its performance with several existing techniques, including Landweber method, Van Cittert method, GMRES (generalized minimal residual method), and LSQR (least square), to demonstrate its superior performance in image deblurring. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:A B S T R A C T Despite the great breakthroughs that deep convolutional neural networks (DCNNs) have achieved on im-age representation learning in recent years, they lack the ability to extract invariant information from im-ages. On the other hand, several traditional feature extractors like Gabor filters are widely used for invari-ant information learning from images. In this paper, we propose a new class of DCNNs named adaptive Gabor convolutional networks (AGCNs). In the AGCNs, the convolutional kernels are adaptively multiplied by Gabor filters to construct the Gabor convolutional filters (GCFs), while the parameters in the Gabor functions (i.e., scale and orientation) are learned alongside those in the convolutional kernels. In addi-tion, the GCFs can be regenerated after updating the Gabor filters and convolutional kernels. We evaluate the performance of the proposed AGCNs on image classification using five benchmark image datasets, i.e., MNIST and its rotated version, SVHN, CIFAR-10, CINIC-10, and DogsVSCats. Experimental results show that the AGCNs are robust to spatial transformations and have achieved higher accuracy compared with the DCNNs and other state-of-the-art deep networks. Moreover, the GCFs can be easily embedded into any classical DCNN models (e.g., ResNet) and require fewer parameters than the corresponding DCNNs. (c) 2021 Elsevier Ltd. All rights reserved.