首页期刊导航|Pattern Recognition
期刊信息/Journal information
Pattern Recognition
Pergamon
Pattern Recognition

Pergamon

0031-3203

Pattern Recognition/Journal Pattern RecognitionSCIAHCIISTPEI
正式出版
收录年代

    From soccer video to ball possession statistics

    Sarkar, SaikatMukherjee, Dipti PrasadChakrabarti, Amlan
    11页
    查看更多>>摘要:Ball possession statistics in a soccer match is evaluated by counting the number of valid passes by both teams. The valid passes are determined by monitoring the start and end of a ball passing event initiated by a player. In this work, we map pass detection as detection of split and merge of nodes of a flow network. The players and ball represent nodes in the network. A group is formed by the objects (ball and players) which are spatially close to each other. Objects belonging to the same group are allowed to split or merge. We use this group relation to check if the objects split or merge in the sequence of frames. A constraint is added to the network to make sure that two objects can split only if the objects were previously merged. Flow through the split or merge node of the network denotes a ball pass event. Additional nodes like appear and disappear are added to the network to map the possibility that new objects could appear or old objects may disappear to and from the frame. The minimum cost path in the flow network provides the solution for valid pass events. Experimental evaluation shows that our proposal is at least 4% better in estimating ball possession statistics and 8% better in pass detection of a soccer match seen in a broadcast video than that of competitive methods. (c) 2021 Elsevier Ltd. All rights reserved.

    An empirical study of the impact of masks on face recognition

    Jeevan, GovindZacharias, Geevar C.Nair, Madhu S.Rajan, Jeny...
    17页
    查看更多>>摘要:Face recognition has a wide range of applications like video surveillance, security, access control, etc. Over the past decade, the field of face recognition has matured and grown at par with the latest advancements in technology, particularly deep learning. Convolution Neural Networks have surpassed human accuracy in Face Recognition on popular evaluation tests such as LFW. However, most existing models evaluate their performance with an assumption of the availability of full facial information. The COVID-19 pan-demic has laid forth challenges to this assumption, and to the performance of existing methods and leading-edge algorithms in the field of face recognition. This is in the wake of an explosive increase in the number of people wearing face masks. The reduced amount of facial information available to a recognition system from a masked face impacts their discrimination ability. In this context, we design and conduct a series of experiments comparing the masked face recognition performances of CNN ar-chitectures available in literature and exploring possible alterations in loss functions, architectures, and training methods that can enable existing methods to fully extract and leverage the limited facial infor-mation available in a masked face. We evaluate existing CNN-based face recognition systems for their performance against datasets composed entirely of masked faces, in contrast to the existing standard evaluations where masked or occluded faces are a rare occurrence. The study also presents evidence de-noting an increased impact of network depth on performance compared to standard face recognition. Our observations indicate that substantial performance gains can be achieved by the introduction of masked faces in the training set. The study also inferred that various parameter settings determined suitable for standard face recognition are not ideal for masked face recognition. Through empirical analysis we de-rived new value recommendations for these parameters and settings. (c) 2021 Elsevier Ltd. All rights reserved.

    Explainable scale distillation for hyperspectral image classification

    Shi, ChengFang, LiLv, ZhiyongZhao, Minghua...
    14页
    查看更多>>摘要:The land-covers within an observed remote sensing scene are usually of different scales; therefore, the ensemble of multi-scale information is a commonly used strategy to achieve more accurate scene inter-pretation; however, this process suffers from being time-consuming. In terms of this issue, this paper proposes a scale distillation network to explore the possibility that single-scale classification network can achieve the same (or even better) classification performance compared with multi-scale one. The pro-posed scale distillation network consists of a cumbersome multi-scale teacher network and a lightweight single-scale student network. The former is trained for multi-scale information learning, and the latter improves the classification accuracy by accepting the knowledge from the multi-scale teacher network and its true label. The experimental results show the advantages of scale distillation on hyperspectral image classification. The single-scale student network can even achieve higher evaluation accuracy than the multi-scale teacher network. In addition, a faithful explainable scale network is designed to visually explain the trained scale distillation network. The traditional deep neural network is a black-box and lacks interpretability. The explanation of the trained network can explore more hidden information from the predictions. We visually explain the prediction results of scale distillation network, and the results show that the explainable scale network can more precisely analyze the relationship between the learned scale features and the land-cover categories. Moreover, the possible application of the explainable scale network on classification is further discussed in this study. (c) 2021 Elsevier Ltd. All rights reserved.

    Towards automatic threat detection: A survey of advances of deep learning within X-ray security imaging

    Akcay, SametBreckon, Toby
    12页
    查看更多>>摘要:X-ray security screening is widely used to maintain aviation/transport security, and its significance poses a particular interest in automated screening systems. This paper aims to review computerised X-ray security imaging algorithms by taxonomising the field into conventional machine learning and contemporary deep learning applications. The first part briefly discusses the classical machine learning approaches utilised within X-ray security imaging, while the latter part thoroughly investigates the use of modern deep learning algorithms. The proposed taxonomy sub-categorises the use of deep learning approaches into supervised and unsupervised learning, with a particular focus on object classification, detection, segmentation and anomaly detection tasks. The paper further explores well-established X-ray datasets and provides a performance benchmark. Based on the current and future trends in deep learning, the paper finally presents a discussion and future directions for X-ray security imagery. (c) 2021 Published by Elsevier Ltd.

    GeoConv: Geodesic guided convolution for facial action unit recognition

    Chen, YuedongSong, GuoxianShao, ZhiwenCai, Jianfei...
    9页
    查看更多>>摘要:Automatic facial action unit (AU) recognition has attracted great attention but still remains a challenging task, as subtle changes of local facial muscles are difficult to thoroughly capture. Most existing AU recognition approaches leverage geometry information in a straightforward 2D or 3D manner, which either ignore 3D manifold information or suffer from high computational costs. In this paper, we propose a novel geodesic guided convolution (GeoConv) for AU recognition by embedding 3D manifold information into 2D convolutions. Specifically, the kernel of GeoConv is weighted by our introduced geodesic weights, which are negatively correlated to geodesic distances on a coarsely reconstructed 3D morphable face model. Moreover, based on GeoConv, we further develop an end-to-end trainable framework named GeoCNN for AU recognition. Extensive experiments on BP4D and DISFA benchmarks show that our approach significantly outperforms the state-of-the-art AU recognition methods. (c) 2021 Elsevier Ltd. All rights reserved.

    Leveraging local and global descriptors in parallel to search correspondences for visual localization

    Zhang, PengjuZhang, ChaofanLiu, BingxiWu, Yihong...
    11页
    查看更多>>摘要:Visual localization to compute 6DoF camera pose from a given image has wide applications. Both local and global descriptors are crucial for visual localization. Most of the existing visual localization methods adopt a two-stage strategy: image retrieval first is performed by global descriptors, and then 2D-3D correspondences are made by local descriptors from 2D query image points and its nearest neighbor candidates which are the 3D points visible by these retrieved images. The above two stages are serially performed in these methods. However, due to the fact that 3D points obtained from the retrieval feedback are only rely on global descriptors, these methods cannot fully take the advantages of both local and global descriptors. In this paper, we propose a novel parallel search framework, which fully leverages advantages of both local and global descriptors to get nearest neighbor candidates of a 2D query image point. Specifically, besides using deep learning based global descriptors, we also utilize local descriptors to construct random tree structures for obtaining nearest neighbor candidates of the 2D query image point. We propose a new probability model and a new deep learning based local descriptor when constructing the random trees. In addition, a weighted Hamming regularization term to keep discriminativeness after binarization is given in loss function for the proposed local descriptor. The loss function co-trains both real and binary local descriptors of which the results are integrated into the random trees. Experiments on challenging benchmarks show that the proposed localization method can significantly improve the robustness and accuracy compared with the ones which get nearest neighbor candidates of a query local feature just based on either local or global descriptors. (c) 2021 Elsevier Ltd. All rights reserved.

    Video anomaly detection with spatio-temporal dissociation

    Chang, YunpengTu, ZhigangXie, WeiLuo, Bin...
    12页
    查看更多>>摘要:Anomaly detection in videos remains a challenging task due to the ambiguous definition of anomaly and the complexity of visual scenes from real video data. Different from the previous work which utilizes reconstruction or prediction as an auxiliary task to learn the temporal regularity, in this work, we explore a novel convolution autoencoder architecture that can dissociate the spatio-temporal representation to separately capture the spatial and the temporal information, since abnormal events are usually different from the normality in appearance and/or motion behavior. Specifically, the spatial autoencoder models the normality on the appearance feature space by learning to reconstruct the input of the first individual frame (FIF), while the temporal part takes the first four consecutive frames as the input and the RGB difference as the output to simulate the motion of optical flow in an efficient way. The abnormal events, which are irregular in appearance or in motion behavior, lead to a large reconstruction error. To improve detection performance on fast moving outliers, we exploit a variance-based attention module and insert it into the motion autoencoder to highlight large movement areas. In addition, we propose a deep K means cluster strategy to force the spatial and the motion encoder to extract a compact representation. Extensive experiments on some publicly available datasets have demonstrated the effectiveness of our method which achieves the state-of-the-art performance. The code is publicly released at the link1. (C) 2021 Elsevier Ltd. All rights reserved.

    Multi-attention augmented network for single image super-resolution

    Chen, RuiZhang, HengLiu, Jixin
    11页
    查看更多>>摘要:How to improve the representational power of visual features extracted by deep convolutional neural networks is of crucial importance for high-quality image super-resolution. To address this issue, we propose a multi-attention augmented network, which mainly consists of content-, orientation-and position-aware modules. Specifically, we develop an attention augmented U-net structure to form the content-aware module in order to learn and combine multi-scale informative features within a large receptive field. To better reconstruct image details in different directions, we design a set of pre-defined sparse kernels to construct the orientation-aware module, which can extract more representative multi-orientation features and enhance the discriminative capacity in stacked convolutional stages. Then these extracted features are adaptively fused through channel attention mechanism. In upscale stage, the position-aware module adopts a novel self-attention to reweight the element-wise value of final low-resolution feature maps, for further suppressing the possible artifacts. Experimental results demonstrate that our method obtains better reconstruction accuracy and perceptual quality against state-of-the-art methods. (c) 2021 Elsevier Ltd. All rights reserved.

    Deep neighbor-aware embedding for node clustering in attributed graphs

    Wang, ChunPan, ShiruiYu, Celina P.Hu, Ruiqi...
    13页
    查看更多>>摘要:Node clustering aims to partition the vertices in a graph into multiple groups or communities. Existing studies have mostly focused on developing deep learning approaches to learn a latent representation of nodes, based on which simple clustering methods like k-means are applied. These two-step frameworks for node clustering are difficult to manipulate and usually lead to suboptimal performance, mainly because the graph embedding is not goal-directed, i.e., designed for the specific clustering task. In this paper, we propose a clustering-directed deep learning approach, Deep Neighbor-aware Embedded Node Clustering (DNENC for short) for clustering graph data. Our method focuses on attributed graphs to sufficiently explore the two sides of information in graphs. It encodes the topological structure and node content in a graph into a compact representation via a neighbor-aware graph autoencoder, which progressively absorbs information from neighbors via a convolutional or attentional encoder. Multiple neighbor aware encoders are stacked to build a deep architecture followed by an inner-product decoder for reconstructing the graph structure. Furthermore, soft labels are generated to supervise a self-training process, which iteratively refines the node clustering results. The self-training process is jointly learned and optimized with the graph embedding in a unified framework, to benefit both components mutually. Experimental results compared with state-of-the-art algorithms demonstrate the good performance of our framework. (C) 2021 Elsevier Ltd. All rights reserved.

    Hyperspectral super-resolution via coupled tensor ring factorization

    He, WeiChen, YongYokoya, NaotoLi, Chao...
    10页
    查看更多>>摘要:Hyperspectral super-resolution (HSR) fuses a low-resolution hyperspectral image (HSI) and a high-resolution multispectral image (MSI) to obtain a high-resolution HSI (HR-HSI). In this paper, we propose a new model called coupled tensor ring factorization (CTRF) for HSR. The proposed CTRF approach simul-taneously learns the tensor ring core tensors of the HR-HSI from a pair of HSI and MSI. The CTRF model can separately exploit the low-rank property of each class (Section 3.3), which has not been explored in previous coupled tensor models. Meanwhile, the model inherits the simple representation of coupled matrix/canonical polyadic factorization and flexible low-rank exploration of coupled Tucker factorization. We further introduce spectral nuclear norm regularization to explore the global spectral low-rank prop-erty. The experiments demonstrated the advantage of the proposed nuclear norm regularized CTRF model compared to previous matrix/tensor and deep learning methods. (c) 2021 Elsevier Ltd. All rights reserved.