首页期刊导航|Pattern Recognition
期刊信息/Journal information
Pattern Recognition
Pergamon
Pattern Recognition

Pergamon

0031-3203

Pattern Recognition/Journal Pattern RecognitionSCIAHCIISTPEI
正式出版
收录年代

    Indefinite twin support vector machine with DC functions programming

    An, YuexuanXue, Hui
    15页
    查看更多>>摘要:A B S T R A C T Twin support vector machine (TWSVM) is an efficient algorithm for binary classification. However, the lack of the structural risk minimization principle restrains the generalization of TWSVM and the guaran-tee of convex optimization constraints TWSVM to only use positive semi-definite kernels (PSD). In this paper, we propose a novel TWSVM for indefinite kernel called indefinite twin support vector machine with difference of convex functions programming (ITWSVM-DC). The indefinite T WSVM (IT WSVM) lever-ages a maximum margin regularization term to improve the generalization of TWSVM and a smooth quadratic hinge loss function to make the model continuously differentiable. The representer theorem is applied to the ITWSVM and the convexity of the ITWSVM is analyzed. In order to address the non-convex optimization problem when the kernel is indefinite, a difference of convex functions (DC) is used to decompose the non-convex objective function into the subtraction of two convex functions and a line search method is applied in the DC algorithm to accelerate the convergence rate. A theoretical analysis illustrates that ITWSVM-DC can converge to a local optimum and extensive experiments on indefinite and positive semi-definite kernels show the superiority of ITWSVM-DC. (c) 2021 Elsevier Ltd. All rights reserved.

    Unified learning approach for egocentric hand gesture recognition and fingertip detection

    Alam, Mohammad MahmudulIslam, Mohammad TariqulRahman, S. M. Mahbubur
    11页
    查看更多>>摘要:Head-mounted device-based human-computer interaction often requires egocentric recognition of hand gestures and fingertips detection. In this paper, a unified approach of egocentric hand gesture recognition and fingertip detection is introduced. The proposed algorithm uses a single convolutional neural network to predict the probabilities of finger class and positions of fingertips in one forward propagation. Instead of directly regressing the positions of fingertips from the fully connected layer, the ensemble of the posi-tion of fingertips is regressed from the fully convolutional network. Subsequently, the ensemble average is taken to regress the final position of fingertips. Since the whole pipeline uses a single network, it is sig-nificantly fast in computation. Experimental results show that the proposed method outperforms the ex -isting fingertip detection approaches including the Direct Regression and the Heatmap-based framework. The effectiveness of the proposed method is also shown in-the-wild scenario as well as in a use-case of virtual reality. (c) 2021 Elsevier Ltd. All rights reserved.

    Learning sequentially diversified representations for fine-grained categorization

    Zhang, LianboHuang, ShaoliLiu, Wei
    12页
    查看更多>>摘要:Learning representation carrying rich local information is essential for recognizing fine-grained objects. Existing methods to this task resort to multi-stage frameworks to capture fine-grained information. However, they usually require multiple forward passes of the backbone network, resulting in efficiency deterioration. In this paper, we propose Sequentially Diversified Networks (SDNs) that enrich representation by promoting their diversity while maintaining the extraction efficiency. Specifically, we construct multiple lightweight sub-networks to model mutually different scales of discriminative patterns. The design of these sub-networks follows the sequentially diversified constraint, encouraging them to be varied in spatial attention. By inserting these sub-networks into a single backbone network, SDNs enable information interaction among local regions of the fine-grained image. In this way, SDNs jointly promote diversity in terms of scale and spatial attention in the one-stage pipeline, thereby facilitating the learning of diversified representation efficiently. We evaluate our proposed method on three challenging datasets, namely CUB-200-2011, Stanford-Cars, and FGVC-Aircraft. Experiments demonstrate its effectiveness in learning diversified information. Moreover, our method achieves state-of-the-art performance, only requiring a single forward pass of the backbone network, which reduces inference time noticeably. (c) 2021 Elsevier Ltd. All rights reserved.

    Deep tree-ensembles for multi-output prediction

    Nakano, Felipe KenjiPliakos, KonstantinosVens, Celine
    13页
    查看更多>>摘要:Recently, deep neural networks have expanded the state-of-art in various scientific fields and provided solutions to long standing problems across multiple application domains. Nevertheless, they also suf-fer from weaknesses since their optimal performance depends on massive amounts of training data and the tuning of an extended number of parameters. As a countermeasure, some deep-forest methods have been recently proposed, as efficient and low-scale solutions. Despite that, these approaches simply em-ploy label classification probabilities as induced features and primarily focus on traditional classification and regression tasks, leaving multi-output prediction under-explored. Moreover, recent work has demon-strated that tree-embeddings are highly representative, especially in structured output prediction. In this direction, we propose a novel deep tree-ensemble (DTE) model, where every layer enriches the origi-nal feature set with a representation learning component based on tree-embeddings. In this paper, we specifically focus on two structured output prediction tasks, namely multi-label classification and multi-target regression. We conducted experiments using multiple benchmark datasets and the obtained results confirm that our method provides superior results to state-of-the-art methods in both tasks. (c) 2021 Elsevier Ltd. All rights reserved.

    Cloud based scalable object recognition from video streams using orientation fusion and convolutional neural networks

    Yaseen, Muhammad UsmanAnjum, AshiqFortino, GiancarloLiotta, Antonio...
    13页
    查看更多>>摘要:Object recognition from live video streams comes with numerous challenges such as the variation in illu-mination conditions and poses. Convolutional neural networks (CNNs) have been widely used to perform intelligent visual object recognition. Yet, CNNs still suffer from severe accuracy degradation, particularly on illumination-variant datasets. To address this problem, we propose a new CNN method based on orien-tation fusion for visual object recognition. The proposed cloud-based video analytics system pioneers the use of bi-dimensional empirical mode decomposition to split a video frame into intrinsic mode functions (IMFs). We further propose these IMFs to endure Reisz transform to produce monogenic object compo-nents, which are in turn used for the training of CNNs. Past works have demonstrated how the object orientation component may be used to pursue accuracy levels as high as 93%. Herein we demonstrate how a feature-fusion strategy of the orientation components leads to further improving visual recogni-tion accuracy to 97%. We also assess the scalability of our method, looking at both the number and the size of the video streams under scrutiny. We carry out extensive experimentation on the publicly avail-able Yale dataset, including also a self generated video datasets, finding significant improvements (both in accuracy and scale), in comparison to AlexNet, LeNet and SE-ResNeXt, which are three most commonly used deep learning models for visual object recognition and classification. (c) 2021 Elsevier Ltd. All rights reserved.

    Lifelong robotic visual-tactile perception learning

    Dong, JiahuaCong, YangSun, GanZhang, Tao...
    12页
    查看更多>>摘要:Lifelong machine learning can learn a sequence of consecutive robotic perception tasks via transferring previous experiences. However, 1) most existing lifelong learning based perception methods only take advantage of visual information for robotic tasks, while neglecting another important tactile sensing modality to capture discriminative material properties; 2) Meanwhile, they cannot explore the intrinsic relationships across different modalities and the common characterization among different tasks of each modality, due to the distinct divergence between heterogeneous feature distributions. To address above challenges, we propose a new Lifelong Visual-Tactile Learning (LVTL) model for continuous robotic visual-tactile perception tasks, which fully explores the latent correlations in both intra-modality and cross-modality aspects. Specifically, a modality-specific knowledge library is developed for each modality to explore common intra-modality representations across different tasks, while narrowing intra-modality mapping divergence between semantic and feature spaces via an auto-encoder mechanism. Moreover, a sparse constraint based modality-invariant space is constructed to capture underlying cross-modality correlations and identify the contributions of each modality for new coming visual-tactile tasks. We further propose a modality consistency regularizer to efficiently align the heterogeneous visual and tactile samples, which ensures the semantic consistency between different modality-specific knowledge libraries. After deriving an efficient model optimization strategy, we conduct extensive experiments on several representative datasets to demonstrate the superiority of our LVTL model. Evaluation experiments show that our proposed model significantly outperforms existing state-of-the-art methods with about 1.16%similar to 15.36% improvement under different lifelong visual-tactile perception scenarios. (C) 2021 Elsevier Ltd. All rights reserved.

    A hierarchical model for learning to understand head gesture videos

    Li, JiachenXu, SonghuaQin, Xueying
    15页
    查看更多>>摘要:Head gesture videos recorded of a person bear rich information about the individual. Automatically understanding these videos can empower many useful human-centered applications in areas such as smart health, education, work safety and security. To understand a video's content, low-level head gesture signals carried in the video that capture characteristics of both human postures and motions need to be translated into high-level semantic labels. To meet this aim, we propose a hierarchical model for learning to understand head gesture videos. Given a head gesture video of an arbitrary length, the model first segments the full-length video into multiple short clips for clip-based feature extraction. Multiple base feature extraction procedures are then independently tuned via a set of peripheral learning tasks without consuming any labels of the goal task. These independently derived base features are subsequently aggregated through a multi-task learning framework, coupled with a feature dimensionality reduction module, to optimally learn to accomplish the end video understanding task in an weakly supervised manner, utilizing the limited amount of video labels available of the goal task. Experimental results show that the hierarchical model is superior to multiple state-of-the-art peer methods in tackling versatile video understanding tasks. (c) 2021 Elsevier Ltd. All rights reserved.

    SAR-to-optical image translation based on improved CGAN

    Yang, XiZhao, JingyiWei, ZiyuWang, Nannan...
    9页
    查看更多>>摘要:SAR images have the advantages of being less susceptible to clouds and light, while optical images conform to the human vision system. Both of them are widely applied in the field of scene classifica-tion, natural environment monitoring, disaster warning, etc. However, due to the speckle noise caused by the SAR imaging principle, it is difficult for people to distinguish the ground objects from complex background without professional knowledge. One commonly used solution is to exploit Generative Ad-versarial Networks (GAN) to translate SAR images to optical images which is able to clearly present ground objects with rich color information, i.e., SAR-to-optical image translation. Traditional GAN-based translation methods are apt to cause blurring of contour, disappearance of texture and inconsistency of color. To this end, we propose an improved conditional GAN (ICGAN) method. Compared with the basic CGAN model, the translation ability of our method is improved in the following three aspects. (1) Contour sharpness. We utilize the parallel branches to combine low-level and high-level features, and thus the image contour information is improved without the influence of noise. (2) Texture fine-grainedness. We discriminate the image using multi-scale receptive fields to enrich the local and global texture features of the image. (3) Color fidelity. We use the chromatic aberration loss which is based on Gaussian blur convolution to reduce the color gap between the generated image and the real opti-cal image. Our method considers both the visual layer and the conceptual layer of the image to com-plete the SAR-to-optical image translation task. The model is able to preserve the contours and textures of the SAR image, while more closely approximates the colors of the ground truth. The experimental results show that the generated image not only has preferable results in visual effects and favorable evaluation metrics (subjective and objective), but also achieves outstanding classification accuracy, which proves the superiority of our method over the state-of-the-arts in the SAR-to-optical image translation task. (c) 2021 Elsevier Ltd. All rights reserved.

    Pareto optimization of deep networks for COVID-19 diagnosis from chest X-rays

    Guarrasi, ValerioD'Amico, Natascha ClaudiaSicilia, RosaCordelli, Ermanno...
    13页
    查看更多>>摘要:The year 2020 was characterized by the COVID-19 pandemic that has caused, by the end of March 2021, more than 2.5 million deaths worldwide. Since the beginning, besides the laboratory test, used as the gold standard, many applications have been applying deep learning algorithms to chest X-ray images to recognize COVID-19 infected patients. In this context, we found out that convolutional neural networks perform well on a single dataset but struggle to generalize to other data sources. To overcome this limita-tion, we propose a late fusion approach where we combine the outputs of several state-of-the-art CNNs, introducing a novel method that allows us to construct an optimum ensemble determining which and how many base learners should be aggregated. This choice is driven by a two-objective function that maximizes, on a validation set, the accuracy and the diversity of the ensemble itself. A wide set of ex-periments on several publicly available datasets, accounting for more than 92,0 0 0 images, shows that the proposed approach provides average recognition rates up to 93.54% when tested on external datasets. (c) 2021 Elsevier Ltd. All rights reserved.

    Learning to schedule multi-NUMA virtual machines via reinforcement learning

    Jin, BoWang, JunWang, XiangfengZhu, Lei...
    11页
    查看更多>>摘要:With the rapid development of cloud computing, the importance of dynamic virtual machine scheduling is increasing. Existing works formulate the VM scheduling as a bin-packing problem and design greedy methods to solve it. However, cloud service providers widely adopt multi-NUMA architecture servers in recent years, and existing methods do not consider the architecture. This paper formulates the multiNUMA VM scheduling into a novel structured combinatorial optimization and transforms it into a reinforcement learning problem. We propose a reinforcement learning algorithm called SchedRL with a delta reward scheme and an episodic guided sampling strategy to solve the problem efficiently. Evaluating on a public dataset of Azure under two different scenarios, our SchedRL outperforms FirstFit and BestFit on the fulfill number and allocation rate. (c) 2021 Elsevier Ltd. All rights reserved.