首页期刊导航|Pattern Recognition
期刊信息/Journal information
Pattern Recognition
Pergamon
Pattern Recognition

Pergamon

0031-3203

Pattern Recognition/Journal Pattern RecognitionSCIAHCIISTPEI
正式出版
收录年代

    Weakly Supervised Segmentation of COVID19 Infection with Scribble Annotation on CT Images

    Liu, XiaomingYuan, QuanGao, YaozongHe, Kelei...
    15页
    查看更多>>摘要:Segmentation of infections from CT scans is important for accurate diagnosis and follow-up in tackling the COVID-19. Although the convolutional neural network has great potential to automate the segmentation task, most existing deep learning-based infection segmentation methods require fully annotated ground-truth labels for training, which is time-consuming and labor-intensive. This paper proposed a novel weakly supervised segmentation method for COVID-19 infections in CT slices, which only requires scribble supervision and is enhanced with the uncertainty-aware self-ensembling and transformation consistent techniques. Specifically, to deal with the difficulty caused by the shortage of supervision, an uncertainty-aware mean teacher is incorporated into the scribble-based segmentation method, encouraging the segmentation predictions to be consistent under different perturbations for an input image. This mean teacher model can guide the student model to be trained using information in images without requiring manual annotations. On the other hand, considering the output of the mean teacher contains both correct and unreliable predictions, equally treating each prediction in the teacher model may degrade the performance of the student network. To alleviate this problem, the pixel level uncertainty measure on the predictions of the teacher model is calculated, and then the student model is only guided by reliable predictions from the teacher model. To further regularize the network, a transformation-consistent strategy is also incorporated, which requires the prediction to follow the same transformation if a transform is performed on an input image of the network. The proposed method has been evaluated on two public datasets and one local dataset. The experimental results demonstrate that the proposed method is more effective than other weakly supervised methods and achieves similar performance as those fully supervised. (c) 2021 Elsevier Ltd. All rights reserved.

    Skeleton-based relational reasoning for group activity analysis

    Perez, MauricioLiu, JunKot, Alex C.
    12页
    查看更多>>摘要:Research on group activity recognition mostly leans on the standard two-stream approach (RGB and Optical Flow) as their input features. Few have explored explicit pose information, with none using it directly to reason about the persons interactions. In this paper, we leverage the skeleton information to learn the interactions between the individuals straight from it. With our proposed method GIRN, multiple relationship types are inferred from independent modules, that describe the relations between the body joints pair-by-pair. Additionally to the joints relations, we also experiment with the previously unexplored relationship between individuals and relevant objects (e.g. volleyball). The individuals distinct relations are then merged through an attention mechanism, that gives more importance to those individuals more relevant for distinguishing the group activity. We evaluate our method in the Volleyball dataset, obtaining competitive results to the state-of-the-art. Our experiments demonstrate the potential of skeleton-based approaches for modeling multi-person interactions. (c) 2021 Elsevier Ltd. All rights reserved.

    Reasoning structural relation for occlusion-robust facial landmark localization

    Zhu, CongcongLi, XiaoqiangLi, JideDai, Songmin...
    9页
    查看更多>>摘要:A B S T R A C T In facial landmark localization tasks, various occlusions heavily degrade the localization accuracy due to the partial observability of facial features. This paper proposes a structural relation network (SRN) for occlusion-robust landmark localization. Unlike most existing methods that simply exploit the shape con-straint, the proposed SRN aims to capture the structural relations among different facial components. These relations can be considered a more powerful shape constraint against occlusion. To achieve this, a hierarchical structural relation module (HSRM) is designed to hierarchically reason the structural rela-tions that represent both long-and short-distance spatial dependencies. Compared with existing network architectures,the HSRM can efficiently model the spatial relations by leveraging its geometry-aware net-work architecture, which reduces the semantic ambiguity caused by occlusion. Moreover, the SRN aug -ments the training data by synthesizing occluded faces. To further extend our SRN for occluded video data, we formulate the occluded face synthesis as a Markov decision process (MDP). Specifically, it plans the movement of the dynamic occlusion based on an accumulated reward associated with the perfor-mance degradation of the pre-trained SRN. This procedure augments hard samples for robust facial land -mark tracking. Extensive experimental results indicate that the proposed method achieves outstanding performance on occluded and masked faces. Code is available at https://github.com/zhuccly/SRN (c) 2021 Elsevier Ltd. All rights reserved.

    Coarse-to-fine pseudo supervision guided meta-task optimization for few-shot object classification

    Cui, YawenLiao, QingHu, DewenAn, Wei...
    11页
    查看更多>>摘要:Few-Shot Learning (FSL) is a challenging and practical learning pattern, aiming to solve a target task which has only a few labeled examples. Currently, the field of FSL has made great progress, but largely in the supervised setting, where a large auxiliary labeled dataset is required for offline training. However, the unsupervised FSL (UFSL) problem where the auxiliary dataset is fully unlabeled has been seldom in-vestigated despite of its significant value. This paper focuses on the more general and challenging UFSL problem and presents a novel method named Coarse-to-Fine Pseudo Supervision-guided Meta-Learning (C2FPS-ML) for unsupervised few-shot object classification. It first obtains prior knowledge from an un-labeled auxiliary dataset during unsupervised meta-training, and then use the prior knowledge to as-sist the downstream few-shot classification task. Coarse-to-Fine Pseudo Supervisions in C2FPS-ML aim to optimize meta-task sampling process in unsupervised meta-training stage which is one of the domi-nant factors for improving the performance of meta-learning based FSL algorithms. Human can learn new concepts progressively or hierarchically following the coarse-to-fine manners. By simulating this human's behaviour, we develop two versions of C2FPS-ML for two different scenarios: one is natural object dataset and another one is other kinds of dataset (e.g., handwritten character dataset). For natural object dataset scenario, we propose to exploit the potential hierarchical semantics of the unlabeled auxiliary dataset to build a tree-like structure of visual concepts. For another scenario, progressive pseudo supervision is obtained by forming clusters in different similarity aspects and is represented by a pyramid-like struc-ture. The obtained structure is applied as the supervision to construct meta-tasks in meta-training stage, and prior knowledge from the unlabeled auxiliary dataset is learned from the coarse-grained level to the fine-grained level. The proposed method sets the new state of the art on the gold-standard miniImageNet and achieves remarkable results on Omniglot while simultaneously increases efficiency. (c) 2021 Elsevier Ltd. All rights reserved.

    Gaussian-guided feature alignment for unsupervised cross-subject adaptation

    Zhang, KuangenChen, JiahongWang, JingLeng, Yuquan...
    14页
    查看更多>>摘要:Human activities recognition (HAR) and human intent recognition (HIR) are important for medical diagnosis and human-robot interaction. HAR and HIR usually rely on the signals of some wearable sensors, such as inertial measurement unit (IMU), but these signals may be user-dependent, which degrades the performance of the recognition algorithm on new subjects. Traditional supervised learning methods require labeling signals and training specific classifiers for each new subject, which is burdensome. To deal with this problem, this paper proposes a novel non-adversarial cross-subject adaptation method called Gaussian-guided feature alignment (GFA). The proposed GFA metric quantifies the discrepancy between the labeled features of source subjects and the unlabeled features of target subjects so that minimizing the GFA metric leads to the alignment of the source and target features. The GFA metric is estimated by calculating the divergence between the feature distribution and Gaussian distribution, as well as the mean squared error of the mean and variance between source and target features. This paper analytically proves the effect of the GFA metric and validates its performance using three public human activity datasets. Experimental results show that the proposed GFA achieves 1% higher target classification accuracy and 0.5% lower variance than state-of-the-art methods in case of cross-subject validation. These results indicate that the proposed GFA is feasible for improving the generalization of the HAR and HIR. (c) 2021 Elsevier Ltd. All rights reserved.

    JSPNet: Learning joint semantic & instance segmentation of point clouds via feature self-similarity and cross-task probability

    Chen, FengWu, FeiGao, GuangweiJi, Yimu...
    11页
    查看更多>>摘要:In this paper, we propose a novel method named JSPNet, to segment 3D point cloud in semantic and instance simultaneously. First, we analyze the problem in addressing joint semantic and instance segmentation, including the common ground of cooperation of two tasks, conflict of two tasks, quadruplet relation between semantic and instance distributions, and ignorance of existing works. Then we introduce our method to reinforce mutual cooperation and alleviate the essential conflict. Our method has a shared encoder and two decoders to address two tasks. Specifically, to maintain discriminative features and characterize inconspicuous content, a similarity-based feature fusion module is designed to locate the inconspicuous area in the feature of current branch and then select related features from the other branch to compensate for the unclear content. Furthermore, given the salient semantic feature and the salient instance feature, a cross-task probability-based feature fusion module is developed to establish the probabilistic correlation between semantic and instance features. This module could transform features from one branch and further fuse them with the other branch by multiplying probabilistic matrix. Experimental results on a large-scale 3D indoor point cloud dataset S3DIS and a part-segmentation dataset ShapeNet have demonstrated the superiority of our method over existing state-of-the-arts in both semantic and instance segmentation. The proposed method outperforms PointNet with 12% and 26% improvements and outperforms ASIS with 2.7% and 4.3% improvements in terms of mIoU and mPre. Code of this work has been made available at https://github.com/Chenfeng1271/JSPNet . (c) 2021 Elsevier Ltd. All rights reserved.

    End-to-End Supermask Pruning: Learning to Prune Image Captioning Models

    Tan, Jia HueiChan, Chee SengChuah, Joon Huang
    12页
    查看更多>>摘要:With the advancement of deep models, research work on image captioning has led to a remarkable gain in raw performance over the last decade, along with increasing model complexity and computational cost. However, surprisingly works on compression of deep networks for image captioning task has received little to no attention. For the first time in image captioning research, we provide an extensive comparison of various unstructured weight pruning methods on three different popular image captioning architectures, namely Soft-Attention, Up-Down and Object Relation Transformer . Following this, we propose a novel end-to-end weight pruning method that performs gradual sparsification based on weight sensitivity to the training loss. The pruning schemes are then extended with encoder pruning, where we show that conducting both decoder pruning and training simultaneously prior to the encoder pruning provides good overall performance. Empirically, we show that an 80% to 95% sparse network (up to 75% reduction in model size) can either match or outperform its dense counterpart. The code and pre-trained models for Up-Down and Object Relation Transformer that are capable of achieving CIDEr scores > 120 on the MSCOCO dataset but with only 8.7 MB and 14.5 MB in model size (size reduction of 96% and 94% respectively against dense versions) are publicly available at https://github.com/jiahuei/sparse- image-captioning . (c) 2021 Elsevier Ltd. All rights reserved.

    Who is closer: A computational method for domain gap evaluation

    Liu, XiaobinZhang, Shiliang
    11页
    查看更多>>摘要:Domain gaps between different datasets limit the generalization ability of CNN models. Precise evaluation on the domain gap has potential to assist the promotion of CNN generalization ability. This paper pro -poses a computational framework to evaluate gaps between different domains, e.g., judging which one of source domains is closer to the target domain. Our model is based on the observation that, given a well-trained classifier on the source domain, the entropy of its classification scores of the output layer can be used as an indicator of the domain gap. For instance, smaller domain gap generally corresponds to smaller entropy of classification scores. To further boost the discriminative power in distinguishing domain gaps, a novel training strategy is proposed to supervise the model to produce smaller entropy on one source domain and larger entropy on other source domains. This supervision leads to an efficient and discriminative domain gap evaluation model. Extensive experiments on multiple datasets including faces, vehicles, fashions, and persons, etc . show that our method can reasonably measure domain gaps. We further conduct experiments on domain adaptive person ReID task and our method is adopted to pre-trained model selection, pre-trained model fusion, source dataset fusion, and source dataset selection. As shown in the experiments, our method substantially boosts the ReID accuracy. To the best of our knowl-edge, this is an original work focusing on computational domain gap evaluation. Our code is available at https://github.com/liu-xb/DomainGapEvaluation . (c) 2021 Published by Elsevier Ltd.

    Non-stationary, online variational Bayesian learning, with circular variables

    Christmas, J.
    13页
    查看更多>>摘要:We introduce an online variational Bayesian model for tracking changes in a non-stationary, multivariate, temporal signal, using as an example the changing frequency and amplitude of a noisy sinusoidal signal over time. The model incorporates each observation as it arrives and then discards it, and places priors over precision hyperparameters to ensure that (i) the posterior probability distributions do not become overly tight, which would impede its ability to recognise and track changes, and (ii) no values in the system are able to continuously increase and hence exceed the numerical representation of the programming language. It is thus able to perform truly online processing for an infinitely long set of observations. Only a single round of updates in the variational Bayesian scheme per observation is used, and the complexity of the algorithm is constant in time. The proposed method is demonstrated on a large number of synthetic datasets, comparing the results from the full model (with precision hyperparameters as variables with priors) with those from the base model where the precision hyperparameters are fixed values. The full model is also demonstrated on a set of real climate data. (c) 2021 Published by Elsevier Ltd.

    Context extraction module for deep convolutional neural networks

    Singh, PravendraMazumder, PratikNamboodiri, Vinay P.
    11页
    查看更多>>摘要:Convolutional layers convolve the input feature maps to generate valuable output features, and they help deep learning methods significantly in solving complex problems. In order to tackle problems efficiently, deep learning solutions should ensure that the parameters of the model do not increase significantly with the complexity of the problem. Pointwise convolutions are primarily used for parameter reduction in many deep learning architectures. They are convolutional filters of kernel size 1 x 1 . The pointwise convolution, however, ignores the spatial information around the points it is processing. This design is by choice, in order to reduce the overall parameters and computations. However, we hypothesize that this shortcoming of pointwise convolution has a significant impact on network performance. We pro -pose a novel alternative design for pointwise convolution, which uses spatial information from the input efficiently. Our approach extracts spatial context information from the input at two scales and further refines the extracted context based on the channel importance. Finally, we add the refined context to the output of the pointwise convolution. This is the first work that improves pointwise convolution by incorporating context information. Our design significantly improves the performance of the networks without substantially increasing the number of parameters and computations. We perform experiments on coarse/fine-grained image classification, few-shot fine-grained classification, and on object detection. We further perform various ablation experiments to validate the significance of the different components used in our design. Lastly, we show experimentally that our proposed technique can be combined with existing state-of-the-art network performance improvement approaches to further improve the network performance. (c) 2021 Elsevier Ltd. All rights reserved.