首页期刊导航|Pattern Recognition
期刊信息/Journal information
Pattern Recognition
Pergamon
Pattern Recognition

Pergamon

0031-3203

Pattern Recognition/Journal Pattern RecognitionSCIAHCIISTPEI
正式出版
收录年代

    Generalizable model-agnostic semantic segmentation via target-specific normalization

    Zhang, JianQi, LeiShi, YinghuanGao, Yang...
    10页
    查看更多>>摘要:Semantic segmentation in a supervised learning manner has achieved significant progress in recent years. However, its performance usually drops dramatically due to the data-distribution discrepancy between seen and unseen domains when we directly deploy the trained model to segment the images of unseen (or new coming) domains. To this end, we propose a novel domain generalization framework for the generalizable semantic segmentation task, which enhances the generalization ability of the model from two different views, including the training paradigm and the test strategy. Concretely, we exploit the model-agnostic learning to simulate the domain shift problem, which deals with the domain generalization from the training scheme perspective. Besides, considering the data-distribution discrepancy between seen source and unseen target domains, we develop the target-specific normalization scheme to enhance the generalization ability. Furthermore, when images come one by one in the test stage, we design the image-based memory bank (Image Bank in short) with style-based selection policy to select similar images to obtain more accurate statistics of normalization. Extensive experiments highlight that the proposed method produces state-of-the-art performance for the domain generalization of semantic segmentation on multiple benchmark segmentation datasets, i.e., Cityscapes, Mapillary. (c) 2021 Elsevier Ltd. All rights reserved.

    Nonconvex 3D array image data recovery and pattern recognition under tensor framework

    Yang, MingLuo, QilunLi, WenXiao, Mingqing...
    13页
    查看更多>>摘要:A B S T R A C T In this paper, we present a weighted tensor Schatten-p quasi-norm ( 0 < p < 1 ) regularizer for 3D array datasets in order to recover the low-rank part and the sparse part, respectively. Corresponding algorithms associated with augmented Lagrangian multipliers are established and the constructed sequence converges to the desirable Karush-Kuhn-Tucker (KKT) point, which is mathematically validated in detail. Although the proposed weighted tensor Schatten-p quasi-norm is non-convex, it appears not only to less penalize the singular values but also to be effective in capturing the low-rank property. The main findings in this paper are the appropriate choice of p depends on specific tasks: low-rank data set recovery usually requires relatively large value of p, while sparse data set recovery needs relatively small value of p. And the weights chosen in our tensor Schatten-p quasi-norm are inversely to the singular values exponentially for promoting the sensitivity to different singular values. Experimental results for video inpainting (tensor completion), image recovery and salient object detection (tensor robust principal component analysis) have been shown that the proposed approach outperforms various latest approaches in literature. (c) 2021 Published by Elsevier Ltd.

    A multimodal attention fusion network with a dynamic vocabulary for TextVQA

    Wu, JiajiaDu, JunWang, FengrenYang, Chen...
    10页
    查看更多>>摘要:Visual question answering (VQA) is a well-known problem in computer vision. Recently, Text-based VQA tasks are getting more and more attention because text information is very important for image understanding. The key to this task is to make good use of text information in the image. In this work, we propose an attention-based encoder-decoder network that combines the multimodal information of visual, linguistic, and location features together. By using the attention mechanism to focus on key features to the question, our multimodal feature fusion can provide more accurate information to improve the performance. Furthermore, we present a decoder with attention map loss, which can not only predict complex answers but also deal with a dynamic vocabulary to reduce the decoding space. Compared with softmax-based cross entropy loss which can only handle a fixed-length vocabulary, the attention map loss significantly improves the accuracy and efficiency. Our method achieved the first place of all three tasks in the ICDAR2019 robust reading challenge on scene text visual question answering (ST-VQA). (c) 2021 Elsevier Ltd. All rights reserved.

    SARS-Net: COVID-19 detection from chest x-rays by combining graph convolutional network and convolutional neural network

    Kumar, AayushTripathi, Ayush R.Satapathy, Suresh ChandraZhang, Yu-Dong...
    13页
    查看更多>>摘要:COVID-19 has emerged as one of the deadliest pandemics that has ever crept on humanity. Screening tests are currently the most reliable and accurate steps in detecting severe acute respiratory syndrome coronavirus in a patient, and the most used is RT-PCR testing. Various researchers and early studies im-plied that visual indicators (abnormalities) in a patient's Chest X-Ray (CXR) or computed tomography (CT) imaging were a valuable characteristic of a COVID-19 patient that can be leveraged to find out virus in a vast population. Motivated by various contributions to open-source community to tackle COVID-19 pandemic, we introduce SARS-Net, a CADx system combining Graph Convolutional Networks and Convo-lutional Neural Networks for detecting abnormalities in a patient's CXR images for presence of COVID-19 infection in a patient. In this paper, we introduce and evaluate the performance of a custom-made deep learning architec-ture SARS-Net, to classify and detect the Chest X-ray images for COVID-19 diagnosis. Quantitative analysis shows that the proposed model achieves more accuracy than previously mentioned state-of-the-art meth-ods. It was found that our proposed model achieved an accuracy of 97.60% and a sensitivity of 92.90% on the validation set. (c) 2021 Elsevier Ltd. All rights reserved.

    Learning-based resilience guarantee for multi-UAV collaborative QoS management

    Bai, ChengchaoYan, PengYu, XiaoqiangGuo, Jifeng...
    13页
    查看更多>>摘要:Unmanned and intelligent technologies are the future development trend in the business field. It is of great significance for the connotation analysis and application characterization of massive interactive data. Particularly, during major epidemics or disasters, how to provide business services safely and securely is crucial. Specifically, providing users with resilient and guaranteed communication services is a challeng-ing business task when the communication facilities are damaged. Unmanned aerial vehicles (UAVs), with flexible deployment and high maneuverability, can be used to serve as aerial base stations (BSs) to estab-lish emergency networks. However, it is challenging to control multiple UAVs to provide efficient and fair communication quality of service (QoS) to users due to their limited communication service capabilities. In this paper, we propose a learning-based resilience guarantee framework for multi-UAV collaborative QoS management. We formulate this problem as a partial observable Markov decision process and solve it with proximal policy optimization (PPO), which is a policy-based deep reinforcement learning method. A centralized training and decentralized execution paradigm is used, where the experience collected by all UAVs is used to train the shared control policy. Each UAV takes actions based on the partial environ-ment information it observes. In addition, the design of the reward function considers the average and variance of the communication QoS of all users. Extensive simulations are conducted for performance evaluation. The simulation results indicate that (1) the trained policies can adapt to different scenarios and provide resilient and guaranteed communication QoS to users, (2) increasing the number of UAVs can compensate for the lack of service capabilities of UAVs, (3) when UAVs have local communication service capabilities, the policies trained with PPO have better performance compared with the policies trained with other algorithms. (c) 2021 Published by Elsevier Ltd.

    Multi-task driven explainable diagnosis of COVID-19 using chest X-ray images

    Malhotra, AakarshMittal, SurbhiMajumdar, PuspitaChhabra, Saheb...
    13页
    查看更多>>摘要:With increasing number of COVID-19 cases globally, all the countries are ramping up the testing numbers. While the RT-PCR kits are available in sufficient quantity in several countries, others are facing challenges with limited availability of testing kits and processing centers in remote areas. This has motivated researchers to find alternate methods of testing which are reliable, easily accessible and faster. Chest X-Ray is one of the modalities that is gaining acceptance as a screening modality. Towards this direction, the paper has two primary contributions. Firstly, we present the COVID-19 Multi-Task Network (COMiT-Net) which is an automated end-to-end network for COVID-19 screening. The proposed network not only predicts whether the CXR has COVID-19 features present or not, it also performs semantic segmentation of the regions of interest to make the model explainable. Secondly, with the help of medical professionals, we manually annotate the lung regions and semantic segmentation of COVID19 symptoms in CXRs taken from the ChestXray-14, CheXpert, and a consolidated COVID-19 dataset. These annotations will be released to the research community. Experiments performed with more than 2500 frontal CXR images show that at 90% specificity, the proposed COMiT-Net yields 96.80% sensitivity. (c) 2021 Published by Elsevier Ltd.

    Discrepant multiple instance learning for weakly supervised object detection

    Gao, WeiWan, FangYue, JunXu, Songcen...
    11页
    查看更多>>摘要:Multiple Instance Learning (MIL) is a fundamental method for weakly supervised object detection (WSOD), but experiences difficulty in excluding local optimal solutions and may miss objects or falsely localize object parts. In this paper, we introduce discrepantly collaborative modules into MIL and thereby create discrepant multiple instance learning (D-MIL), pursuing optimal solutions in a simple-yet-effective way. D-MIL adopts multiple MIL learners to pursue discrepant yet complementary solutions indicating object parts, which are fused with a collaboration module for precise object localization. D-MIL implements a new "teachers-students" model, where MIL learners act as "teachers" and object detectors as "students". Multiple teachers provide rich yet complementary information, which are absorbed by students and transferred back to reinforce the performance of teachers. Experiments show that D-MIL significantly improves the baseline while achieves state-of-the-art performance on the challenging MS-COCO object detection benchmark. (c) 2021 Elsevier Ltd. All rights reserved.

    High dynamic range imaging via gradient-aware context aggregation network

    Yan, QingsenGong, DongShi, Javen QinfengHengel, Anton van den...
    16页
    查看更多>>摘要:Obtaining a high dynamic range (HDR) image from multiple low dynamic range images with different ex-posures is an important step in various computer vision tasks. One of the ongoing challenges in the field is to generate HDR images without ghosting artifacts. Motivated by an observation that such artifacts are particularly noticeable in the gradient domain, in this paper, we propose an HDR imaging approach that aggregates the information from multiple LDR images with guidance from image gradient domain. The proposed method generates artifact-free images by integrating the image gradient information and the image context information in the pixel domain. The context information in a large area helps to re-construct the contents contaminated by saturation and misalignments. Specifically, an additional gradient stream and the supervision in the gradient domain are applied to incorporate the gradient information in HDR imaging. To use the context information captured from a large area while preserving spatial resolu-tion, we adopt dilated convolutions to extract multi-scale features with rich context information. More -over, we build a new dataset containing 40 groups of real-world images from diverse scenes with ground truth to validate the proposed model. The samples in the proposed dataset include more challenging moving objects inducing misalignments. Extensive experimental results demonstrate that our proposed model outperforms previous methods on different datasets in terms of both quantitative measure and visual perception quality. (c) 2021 Elsevier Ltd. All rights reserved.

    Privacy-aware supervised classification: An informative subspace based multi-objective approach

    Biswas, ChandanGanguly, DebasisMukherjee, Partha SarathiBhattacharya, Ujjwal...
    8页
    查看更多>>摘要:Sharing the raw or an abstract representation of a labelled dataset on cloud platforms can potentially expose sensitive information of the data to an adversary, e.g., in the case of an emotion classification task from text, an adversary-agnostic abstract representation of the text data may eventually lead an adversary to identify the demographics of the authors, such as their gender and age. In this paper, we propose a universal defense mechanism against such malicious attempts of stealing sensitive information from data shared on cloud platforms. More specifically, our proposed method employs an informative subspace based multi-objective approach to obtain a sensitive information aware encoding of the data representation. A number of experiments conducted on both standard text and image datasets demonstrate that our proposed approach is able to reduce the effectiveness of the adversarial task (i.e., in other words is able to better protect the sensitive information of the data) without significantly reducing the effectiveness of the primary task itself. (c) 2021 Elsevier Ltd. All rights reserved.

    Kernelized support tensor train machines

    Chen, CongBatselier, KimYu, WenjianWong, Ngai...
    11页
    查看更多>>摘要:Tensor, a multi-dimensional data structure, has been exploited recently in the machine learning commu-nity. Traditional machine learning approaches are vector-or matrix-based, and cannot handle tensorial data directly. In this paper, we propose a tensor train (TT)-based kernel technique for the first time, and apply it to the conventional support vector machine (SVM) for high-dimensional image classifica-tion with very small number of training samples. Specifically, we propose a kernelized support tensor train machine that accepts tensorial input and preserves the intrinsic kernel property. The main con-tributions are threefold. First, we propose a TT-based feature mapping procedure that maintains the TT structure in the feature space. Second, we demonstrate two ways to construct the TT-based kernel func-tion while considering consistency with the TT inner product and preservation of information. Third, we show that it is possible to apply different kernel functions on different data modes. In principle, our method tensorizes the standard SVM on its input structure and kernel mapping scheme. This reduces the storage and computation complexity of kernel matrix construction from exponential to polynomial. The validity proof and computation complexity of the proposed TT-based kernel functions are provided elabo-rately. Extensive experiments are performed on high-dimensional fMRI and color images datasets, which demonstrates the superiority of the proposed scheme compared with the state-of-the-art techniques. (c) 2021 Elsevier Ltd. All rights reserved.