首页期刊导航|Neural Networks
期刊信息/Journal information
Neural Networks
Pergamon Press
Neural Networks

Pergamon Press

0893-6080

Neural Networks/Journal Neural NetworksSCIAHCIEIISTP
正式出版
收录年代

    Two-stage streaming keyword detection and localization with multi-scale depthwise temporal convolution

    Hou, JingyongXie, LeiZhang, Shilei
    15页
    查看更多>>摘要:A keyword spotting (KWS) system running on smart devices should accurately detect the appearances and predict the locations of predefined keywords from audio streams, with small footprint and high efficiency. To this end, this paper proposes a new two-stage KWS method which combines a novel multi-scale depthwise temporal convolution (MDTC) feature extractor and a two-stage keyword detection and localization module. The MDTC feature extractor learns multi-scale feature representation efficiently with dilated depthwise temporal convolution, modeling both the temporal context and the speech rate variation. We use a region proposal network (RPN) as the first-stage KWS. At each frame, we design multiple time regions, which all take the current frame as the end position but have different start positions. These time regions (or formally anchors) are used to indicate rough location candidates of keyword. With frame level features from the MDTC feature extractor as inputs, RPN learns to propose keyword region proposals based on the designed anchors. To alleviate the keyword/non-keyword class imbalance problem, we specifically introduce a hard example mining algorithm to select effective negative anchors in RPN training. The keyword region proposals from the first-stage RPN contain keyword location information which is subsequently used to explicitly extract keyword related sequential features to train the second-stage KWS. The second-stage system learns to classify and transform region proposal to keyword IDs and ground-truth keyword region respectively. Experiments on the Google Speech Command dataset show that the proposed MDTC feature extractor surpasses several competitive feature extractors with a new state-of-the-art command classification error rate of 1.74%. With the MDTC feature extractor, we further conduct wake-up word (WuW) detection and localization experiments on a commercial WuW dataset. Compared to a strong baseline, our proposed two-stage method achieves relatively 27-32% better false rejection rate at one false alarm per hour, while for keyword localization, the two-stage approach achieves more than 0.95 mean intersection-over-union ratio, which is clearly better than the one-stage RPN method.(c) 2022 Elsevier Ltd. All rights reserved.

    Quasi-synchronization of fractional-order multi-layer networks with mismatched parameters via delay-dependent impulsive feedback control

    Xu, YaoLiu, JingjingLi, Wenxue
    15页
    查看更多>>摘要:The paper is devoted to investigating the quasi-synchronization issue of fractional-order multi-layer networks with mismatched parameters under delay-dependent impulsive feedback control. It is worth highlighting that fractional-order multi-layer networks with mismatched parameters, as the extension model for single-layer or two-layer ones, are constructed in this paper. Simultaneously, the intra-layer and inter-layer couplings are taken into consideration, which is more general and rarely considered in discussions of network synchronization. An extended fractional differential inequality with impulsive effects is given to establish the grounded framework and theory on the quasi-synchronization problem under delay-dependent impulsive feedback control. Moreover, in the light of the Lyapunov method and graph theory, two criteria for achieving the quasi-synchronization of fractional-order multi-layer networks with mismatched parameters are derived. Furthermore, exponential convergence rates as well as the bounds of quasi-synchronization errors are successfully deduced. Ultimately, the theoretical results are applied in a practical power system, and some illustrative examples are proposed to show the effectiveness of theoretical analysis. (C)& nbsp;2022 Elsevier Ltd. All rights reserved.

    Unsupervised feature selection via adaptive autoencoder with redundancy control

    Gong, XiaolingYu, LingWang, JianZhang, Kai...
    15页
    查看更多>>摘要:Unsupervised feature selection is one of the efficient approaches to reduce the dimension of unlabeled high-dimensional data. We present a novel adaptive autoencoder with redundancy control (AARC) as an unsupervised feature selector. By adding two Group Lasso penalties to the objective function, AARC integrates unsupervised feature selection and determination of a compact network structure into a single framework. Besides, a penalty based on a measure of dependency between features (such as Pearson correlation, mutual information) is added to the objective function for controlling the level of redundancy in the selected features. To realize the desired effects of different regularizers in different phases of the training, we introduce adaptive parameters which change with iterations. In addition, a smoothing function is utilized to approximate the three penalties since they are not differentiable at the origin. An ablation study is carried out to validate the capabilities of redundancy control and structure optimization of AARC. Subsequently, comparisons with nine state-of-the-art methods illustrate the efficiency of AARC for unsupervised feature selection. (c) 2022 Elsevier Ltd. All rights reserved.

    Robust multi-view subspace clustering based on consensus representation and orthogonal diversity

    Zhao, NanBu, Jie
    10页
    查看更多>>摘要:The main purpose of multi-view subspace clustering is to reveal the intrinsic low-dimensional architecture of data points according to their multi-view characteristics. Exploring the potential relationship from views is one of the most essential research focuses of the multi-view task. To better utilize the complementary and consistency information from distinct views, we propose a novel robust subspace clustering approach based on consensus representation and orthogonal diversity (RMSCCO). A novel defined orthogonality term is adopted to improve the diversity and decrease the redundance of learning subspace representation. The consensus representation and subspace learning are integrated into one unified framework to characterize the consistency from views. The groupingenhanced representation is utilized to maintain the local geometric architecture in the original data space. The l2,1-norm regularizer constraint to the noise is applied to improve the robustness. Finally, an optimization algorithm is exploited to solve RMSCCO with the Alternating Direction Method of Multipliers (ADMM). Extensive experimental results on six challenging datasets demonstrate that our approach has accomplished highly qualified performance.

    Attributes learning network for generalized zero-shot learning

    Yun, YuWang, SenHou, MingzhenGao, Quanxue...
    7页
    查看更多>>摘要:In the absence of unseen training data, zero-shot learning algorithms utilize the semantic knowledge shared by the seen and unseen classes to establish the connection between the visual space and the semantic space, so as to realize the recognition of the unseen classes. However, in real applications, the original semantic representation cannot well characterize both the class-specificity structure and discriminative information in dimension space, which leads to unseen classes being easily misclassified into seen classes. To tackle this problem, we propose a Salient Attributes Learning Network (SALN) to generate discriminative and expressive semantic representation under the supervision of the visual features. Meanwhile, l(1,2)-norm constraint is employed to make the learned semantic representation well characterize the class-specificity structure and discriminative information in dimension space. Then feature alignment network projects the learned semantic representation into visual space and a relation network is adopted for classification. The performance of the proposed approach has made progress on the five benchmark datasets in generalized zero-shot learning task, and in-depth experiments indicate the effectiveness and excellence of our method. (C) 2022 Elsevier Ltd. All rights reserved.

    SelfVIO: Self-supervised deep monocular Visual-Inertial Odometry and depth estimation

    Almalioglu, YasinTuran, MehmetSaputra, Muhamad Risqi U.de Gusmao, Pedro P. B....
    18页
    查看更多>>摘要:In the last decade, numerous supervised deep learning approaches have been proposed for visual- inertial odometry (VIO) and depth map estimation, which require large amounts of labelled data. To overcome the data limitation, self-supervised learning has emerged as a promising alternative that exploits constraints such as geometric and photometric consistency in the scene. In this study, we present a novel self-supervised deep learning-based VIO and depth map recovery approach (SelfVIO) using adversarial training and self-adaptive visual-inertial sensor fusion. SelfVIO learns the joint estimation of 6 degrees-of-freedom (6-DoF) ego-motion and a depth map of the scene from unlabelled monocular RGB image sequences and inertial measurement unit (IMU) readings. The proposed approach is able to perform VIO without requiring IMU intrinsic parameters and/or extrinsic calibration between IMU and the camera. We provide comprehensive quantitative and qualitative evaluations of the proposed framework and compare its performance with state-of-the-art VIO, VO, and visual simultaneous localization and mapping (VSLAM) approaches on the KITTI, EuRoC and Cityscapes datasets. Detailed comparisons prove that SelfVIO outperforms state-of-the-art VIO approaches in terms of pose estimation and depth recovery, making it a promising approach among existing methods in the literature.(c) 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

    Augmented Graph Neural Network with hierarchical global-based residual connections

    Rassil, AsmaaChougrad, HibaZouaki, Hamid
    18页
    查看更多>>摘要:Graph Neural Networks (GNNs) are powerful architectures for learning on graphs. They are efficient for predicting nodes, links and graphs properties. Standard GNN variants follow a message passing schema to update nodes representations using information from higher-order neighborhoods iteratively. Consequently, deeper GNNs make it possible to define high-level nodes representations generated based on local as well as distant neighborhoods. However, deeper networks are prone to suffer from over-smoothing. To build deeper GNN architectures and avoid losing the dependency between lower (the layers closer to the input) and higher (the layers closer to the output) layers, networks can integrate residual connections to connect intermediate layers. We propose the Augmented Graph Neural Network (AGNN) model with hierarchical global-based residual connections. Using the proposed residual connections, the model generates high-level nodes representations without the need for a deeper architecture. We disclose that the nodes representations generated through our proposed AGNN model are able to define an expressive all-encompassing representation of the entire graph. As such, the graph predictions generated through the AGNN model surpass considerably state-of-the-art results. Moreover, we carry out extensive experiments to identify the best global pooling strategy and attention weights to define the adequate hierarchical and global-based residual connections for different graph property prediction tasks. Furthermore, we propose a reversible variant of the AGNN model to address the extensive memory consumption problem that typically arises from training networks on large and dense graph datasets. The proposed Reversible Augmented Graph Neural Network (R-AGNN) only stores the nodes representations acquired from the output layer as opposed to saving all representations from intermediate layers as it is conventionally done when optimizing the parameters of other GNNs. We further refine the definition of the backpropagation algorithm to fit the R-AGNN model. We evaluate the proposed models AGNN and R-AGNN on benchmark Molecular, Bioinformatics and Social Networks datasets for graph classification and achieve state-of-the-art results. For instance the AGNN model realizes improvements of +39% on IMDB-MULTI reaching 91.7% accuracy and +16% on COLLAB reaching 96.8% accuracy compared to other GNN variants. (C) 2022 Elsevier Ltd. All rights reserved.

    A novel ramp loss-based multi-task twin support vector machine with multi-parameter safe acceleration

    Pang, XinyingZhao, JiangXu, Yitian
    19页
    查看更多>>摘要:Direct multi-task twin support vector machine (DMTSVM) is an effective algorithm to deal with multitask classification problems. However, the generated hyperplane may shift to outliers since the hinge loss is used in DMTSVM. Therefore, we propose an improved multi-task model RaMTTSVM based on ramp loss to handle noisy points more effectively. It could limit the maximal loss value distinctly and put definite restrictions on the influences of noises. But RaMTTSVM is non-convex which should be solved by CCCP, then a series of approximate convex problems need to be solved. So, it may be time-consuming. Motivated by the sparse solution of our RaMTTSVM, we further propose a safe acceleration rule MSA to accelerate the solving speed. Based on optimality conditions and convex optimization theory, MSA could delete a lot of inactive samples corresponding to 0 elements in dual solutions before solving the model. Then the computation speed can be accelerated by just solving reduced problems. The rule contains three different parts that correspond to different parameters and different iteration phases of CCCP. It can be used not only for the first approximate convex problem of CCCP but also for the successive problems during the iteration process. More importantly, our MSA is safe in the sense that the reduced problem can derive an identical optimal solution as the original problem, so the prediction accuracy will not be disturbed. Experimental results on one artificial dataset, ten Benchmark datasets, ten Image datasets and one real wine dataset confirm the generalization and acceleration ability of our proposed algorithm.(C) 2022 Elsevier Ltd. All rights reserved.

    Event-triggered delayed impulsive control for nonlinear systems with application to complex neural networks

    Wang, MingzhuLi, XiaodiDuan, Peiyong
    9页
    查看更多>>摘要:This paper studies the Lyapunov stability of nonlinear systems and the synchronization of complex neural networks in the framework of event-triggered delayed impulsive control (ETDIC), where the effect of time delays in impulses is fully considered. Based on the Lyapunov-based event-triggered mechanism (ETM), some sufficient conditions are presented to avoid Zeno behavior and achieve globally asymptotical stability of the addressed system. In the framework of event-triggered impulse control (ETIC), control input is only generated at state-dependent triggered instants and there is no any control input during two consecutive triggered impulse instants, which can greatly reduce resource consumption and control waste. The contributions of this paper can be summarized as follows: Firstly, compared with the classical ETIC, our results not only provide the well-designed ETM to determine the impulse time sequence, but also fully extract the information of time delays in impulses and integrate it into the dynamic analysis of the system. Secondly, it is shown that the time delays in impulses in our results exhibit positive effects, that is, it may contribute to stabilizing a system and achieve better performance. Thirdly, as an application of ETDIC strategies, we apply the proposed theoretical results to synchronization problem of complex neural networks. Some sufficient conditions to ensure the synchronization of complex neural networks are presented, where the information of time delays in impulses is fully fetched in these conditions. Finally, two numerical examples are provided to show the effectiveness and validity of the theoretical results. (C)& nbsp;2022 Elsevier Ltd. All rights reserved.

    Learning online visual invariances for novel objects via supervised and self-supervised training

    Biscione, ValerioBowers, Jeffrey S.
    15页
    查看更多>>摘要:Humans can identify objects following various spatial transformations such as scale and viewpoint. This extends to novel objects, after a single presentation at a single pose, sometimes referred to as online invariance. CNNs have been proposed as a compelling model of human vision, but their ability to identify objects across transformations is typically tested on held-out samples of trained categories after extensive data augmentation. This paper assesses whether standard CNNs can support human-like online invariance by training models to recognize images of synthetic 3D objects that undergo several transformations: rotation, scaling, translation, brightness, contrast, and viewpoint. Through the analysis of models' internal representations, we show that standard supervised CNNs trained on transformed objects can acquire strong invariances on novel classes even when trained with as few as 50 objects taken from 10 classes. This extended to a different dataset of photographs of real objects. We also show that these invariances can be acquired in a self-supervised way, through solving the same/different task. We suggest that this latter approach may be similar to how humans acquire invariances. Crown Copyright (C) 2022 Published by Elsevier Ltd. All rights reserved.