期刊,Pattern Recognition 2022年122卷期_国家学术搜索

期刊信息/Journal information

Pattern Recognition

Pergamon

主办单位：Pergamon

国际刊号：0031-3203

Pattern Recognition/Journal Pattern RecognitionSCIAHCIISTPEI

正式出版

收录年代

Segmentation of Handwritten Arabic Graphemes Using a Directed Convolutional Neural Network and Mathematical Morphology Operations

Elkhayati, MohsineElkettani, YoussfiMourchid, Mohammed

15页

查看更多>>摘要：Due to the nature of Arabic handwriting, segmenting words into characters/graphemes is the most dif-ficult and critical task of the recognition system. The present paper proposes an approach to segment handwritten Arabic words into graphemes based on a directed Convolutional Neural Network (CNN) and Mathematical Morphology Operations (MMO). Arabic script is cursive, which means that almost all graphemes are connected via horizontal links; therefore, a technique to remove links will facilitate the segmentation of graphemes. In general, an MMO such as erosion seems suitable for getting the job done, but since Arabic handwriting is difficult, MMOs cause information loss and suffer from many is-sues such as diacritics and over-traces, which lead to over/under/bad segmentations. To overcome lim-itations, the present paper addresses these issues in the following order: the over-traces issue is ad-dressed for the first time in the literature; a robust algorithm for diacritics extraction is provided; and finally, the main segmentation algorithm adopts a strategy based on a Partial Dilation (PD)-Global Ero-sion (GE) technique to combat the information loss issue. The PD phase amplifies important regions, while GE eliminates links between graphemes. The complementarity between PD and GE facilitates the extraction of graphemes and creates resistance against information loss. To properly tackle these diffi-cult problems, this article exploits the robustness of CNNs, so a new directed CNN model is suggested. The idea is to draw the model's attention to certain targeted features, which are selected according to the nature of the problem addressed. The proposed directed CNN is used in all phases of the segmen-tation process. The experimental results are very encouraging and show that the proposed directed CNN model outperformed basic CNN in many experiments. The results also reveal that the followed strategy improved the ability of MMOs to perform segmentation and to compete with other approaches in this research area. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

Support structure representation learning for sequential data clustering

Wang, XiumeiGuo, DingningCheng, Peitao

10页

查看更多>>摘要：Sequential data clustering is a challenging task in data mining (e.g., motion recognition and video seg-mentation). For good performance in dealing with complex local correlation and high-dimensional struc-ture of sequential data, representation based methods have become one of the hot topics for sequential data clustering, in which subspace clustering is a representative tool. Subspace clustering methods di-vide the sequence into disjoint segments according to a locally continuous and connected representation of raw data. Although the subspace clustering methods maintain the successive property of sequential data well, there exist redundant connections in the intersection of two subsequences, which will destroy the integrity of a cluster and easily cause the chained partition of the sequence. So it is necessary to learn a more specific structure representation of a sequence to preserves both sequential information and efficient connections. Besides, the representation that conducive to clustering should have sparsity and connectivity under some assumptions. To this end, we propose a novel method to learn the support structure representation of sequence, which can extract sufficient information about instances and get the compact structure of sequential data. Furthermore, a new subspace clustering method is proposed based on the representation based method. Theoretical analysis and experimental results show the effectiveness of the proposed method. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

Multi-task framework based on feature separation and reconstruction for cross-modal retrieval

Zhang, LiWu, Xiangqian

10页

查看更多>>摘要：Cross-modal retrieval has become a hot research topic in both computer vision and natural language processing areas. Learning intermediate common space for features of different modalities has become one of mainstream methods. In this paper, we propose a novel multi-task framework based on feature separation and reconstruction (mFSR) for cross-modal retrieval based on common space learning methods, which introduces feature separation module to deal with information asymmetry between different modalities, and introduces image and text reconstruction module to improve the quality of feature separation module. Extensive experiments on MS-COCO and Flickr30K datasets demonstrate that feature separation and specific information reconstruction can significantly improve the baseline performance of cross-modal image-caption retrieval. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

Online learnable keyframe extraction in videos and its application with semantic word vector in action recognition

Elahi, G. M. Mashrur E.Yang, Yee-Hong

12页

查看更多>>摘要：Video processing has become a popular research direction in computer vision due to its various applications such as video summarization, action recognition, etc. Recently, deep learning-based methods have achieved impressive results in action recognition. However, these methods need to process a full video sequence to recognize the action, even though many of the frames in the video sequence are similar and non-essential to recognizing a particular action. Additionally, these non-essential frames increase the computational cost and can confuse a method in action recognition. Instead, the important frames called keyframes not only are helpful in recognizing an action but also can reduce the processing time of each video sequence in classification or in other applications, e.g. summarization. As well, current methods in video processing have not yet been demonstrated in an online fashion. Motivated by the above, we propose an online learnable module for keyframe extraction. This module can be used to select key shots in video and thus, can be applied to video summarization. The extracted keyframes can be used as input to any deep learning-based classification model to recognize action. We also propose a plugin module to use the semantic word vector as input along with keyframes and a novel train/test strategy for the classification models. To our best knowledge, this is the first time such an online module and train/test strategy have been proposed. The experimental results on many commonly used datasets in video summarization and in action recognition have demonstrated the effectiveness of the proposed module. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

Well-calibrated confidence measures for multi-label text classification with a large number of labels

Maltoudoglou, LysimachosPaisios, AndreasLenc, LadislavMartinek, Jiri...

21页

查看更多>>摘要：We extend our previous work on Inductive Conformal Prediction (ICP) for multi-label text classification and present a novel approach for addressing the computational inefficiency of the Label Powerset (LP) ICP, arrising when dealing with a high number of unique labels. We present experimental results using the original and the proposed efficient LP-ICP on two English and one Czech language data-sets. Specifically, we apply the LP-ICP on three deep Artificial Neural Network (ANN) classifiers of two types: one based on contextualised (bert) and two on non-contextualised (word2vec) word-embeddings. In the LP-ICP setting we assign nonconformity scores to label-sets from which the corresponding p-values and prediction-sets are determined. Our approach deals with the increased computational burden of LP by eliminating from consideration a significant number of label-sets that will surely have p-values below the specified signif-icance level. This reduces dramatically the computational complexity of the approach while fully respect -ing the standard CP guarantees. Our experimental results show that the contextualised-based classifier surpasses the non-contextualised-based ones and obtains state-of-the-art performance for all data-sets examined. The good performance of the underlying classifiers is carried on to their ICP counterparts without any significant accuracy loss, but with the added benefits of ICP, i.e. the confidence informa-tion encapsulated in the prediction sets. We experimentally demonstrate that the resulting prediction sets can be tight enough to be practically useful even though the set of all possible label-sets contains more than 1 e + 16 combinations. Additionally, the empirical error rates of the obtained prediction-sets confirm that our outputs are well-calibrated. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

Continuous conditional random field convolution for point cloud segmentation

Yang, FeiDavoine, FranckWang, HuanJin, Zhong...

13页

查看更多>>摘要：Point cloud segmentation is the foundation of 3D environmental perception for modern intelligent sys-tems. To solve this problem and image segmentation, conditional random fields (CRFs) are usually for-mulated as discrete models in label space to encourage label consistency, which is actually a kind of postprocessing. In this paper, we reconsider the CRF in feature space for point cloud segmentation be -cause it can capture the structure of features well to improve the representation ability of features rather than simply smoothing. Therefore, we first model the point cloud features with a continuous quadratic energy model and formulate its solution process as a message-passing graph convolution, by which it can be easily integrated into a deep network. We theoretically demonstrate that the message passing in the graph convolution is equivalent to the mean-field approximation of a continuous CRF model. Further-more, we build an encoder-decoder network based on the proposed continuous CRF graph convolution (CRFConv), in which the CRFConv embedded in the decoding layers can restore the details of high-level features that were lost in the encoding stage to enhance the location ability of the network, thereby benefiting segmentation. Analogous to the CRFConv, we show that the classical discrete CRF can also work collaboratively with the proposed network via another graph convolution to further improve the segmentation results. Experiments on various point cloud benchmarks demonstrate the effectiveness and robustness of the proposed method. Compared with the state-of-the-art methods, the proposed method can also achieve competitive segmentation performance. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

A near effective and efficient model in recognition

Li, ChaoboSuen, Ching Y.Li, HongjunZhou, Ze...

14页

查看更多>>摘要：Neuro-fuzzy models have been applied in various domains, in which the issue of long time-consumption for optimizing parameters and less innovation in fuzzy method for feature extraction remains to be solved. Here, we present a novel cycle reinforce hierarchical model (CRHM) for effective and efficient recognition. The innovative strategies of CRHM consist of the hierarchical structure, the groups of fuzzy subsystems and the cycle mechanism. The hierarchical structure is innovatively built to extract features and transform the low-level features into advanced ones semantically, in which we adopt the groups of fuzzy subsystems as feature extraction units in each hidden layer, which ensures the diversity of features, avoids the fuzzy rules explosion, and reduces the time for clustering. The cycle mechanism is first proposed to connect the hierarchical structure and the output layer directly, transferring the tuned parameters again and again, to reinforce features gradually. To demonstrate the performance of CRHM, we have conducted extensive comparison with several state-of-the-art algorithms on benchmark 1D and 2D datasets. The experimental results show that the recognition rate of CRHM is higher than convolutional neural network (CNN), while the training time is only 5% of CNN's, which confirms that our approach provides a novel model for recognition, which can simultaneously improve the effectiveness and efficiency without the need of advanced equipment. In addition, the analysis results about the contribution of the core strategies to CRHM performance indicates that the contribution of the hierarchical structure is greater than that of the groups of fuzzy subsystems, which is superior than that of the cycle mechanism. (c) 2021 Published by Elsevier Ltd.

原文链接:

NSTL
Elsevier

Discriminative feature generation for classification of imbalanced data

Suh, SunghoLukowicz, PaulLee, Yong Oh

13页

查看更多>>摘要：The data imbalance problem is a frequent bottleneck in the classification performance of neural networks. In this paper, we propose a novel supervised discriminative feature generation (DFG) method for a minority class dataset. DFG is based on the modified structure of a generative adversarial network consisting of four independent networks: generator, discriminator, feature extractor, and classifier. To augment the selected discriminative features of the minority class data by adopting an attention mechanism, the generator for the class-imbalanced target task is trained, and the feature extractor and classifier are regularized using the pre-trained features from a large source data. The experimental results show that the DFG generator enhances the augmentation of the label-preserved and diverse features, and the classification results are significantly improved on the target task. The feature generation model can contribute greatly to the development of data augmentation methods through discriminative feature generation and supervised attention methods. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

Learning panoptic segmentation through feature discriminability

Chu, TaoCai, WenjieLiu, Qiong

11页

查看更多>>摘要：Panoptic segmentation has attracted increasing attention as a joint task of semantic and instance segmentation. However, previous works have not noticed that the different requirements for semantic and instance segmentation can lead to conflict of feature discriminability. Instance segmentation mainly focuses on the central area of each instance in things regions, while semantic segmentation focuses on the whole region of a specific class. To resolve it, we propose: 1) a Dual-FPN framework which separates the shared Feature Pyramid Network (FPN) in previous works to reduce the conflict of receptive field and meet different requirements of the two tasks; 2) a Region Refinement Module which leverages the prediction of semantic segmentation to refine the result of instance segmentation and resolves the conflict between the things regions and the stuff regions. Experimental results on Cityscapes dataset and Mapillary Vistas dataset show that our proposed method can improve the result of both things and stuff and obtain state-of-the-art performance. (c) 2021 Published by Elsevier Ltd.

原文链接:

NSTL
Elsevier

Efficient k -nearest neighbor search based on clustering and adaptive k values

Gallego, Antonio JavierRico-Juan, Juan RamonValero-Mas, Jose J.

17页

查看更多>>摘要：The k-Nearest Neighbor (kNN) algorithm is widely used in the supervised learning field and, particularly, in search and classification tasks, owing to its simplicity, competitive performance, and good statistical properties. However, its inherent inefficiency prevents its use in most modern applications due to the vast amount of data that the current technological evolution generates, being thus the optimization of kNN-based search strategies of particular interest. This paper introduces the caKD+ algorithm, which tackles this limitation by combining the use of feature learning techniques, clustering methods, adaptive search parameters per cluster, and the use of pre-calculated K-Dimensional Tree structures, and results in a highly efficient search method. This proposal has been evaluated using 10 datasets and the results show that caKD+ significantly outperforms 16 state-of-the-art efficient search methods while still depicting such an accurate performance as the one by the exhaustive kNN search. (c) 2021 Elsevier Ltd. All rights reserved.

原文链接:

NSTL
Elsevier

首页
7
8
9
10
11