Verduzco-Flores, SergioDorrell, WilliamDe Schutter, Erik
22页
查看更多>>摘要:In this paper we explore a neural control architecture that is both biologically plausible, and capable of fully autonomous learning. It consists of feedback controllers that learn to achieve a desired state by selecting the errors that should drive them. This selection happens through a family of differential Hebbian learning rules that, through interaction with the environment, can learn to control systems where the error responds monotonically to the control signal. We next show that in a more general case, neural reinforcement learning can be coupled with a feedback controller to reduce errors that arise non-monotonically from the control signal. The use of feedback control can reduce the complexity of the reinforcement learning problem, because only a desired value must be learned, with the controller handling the details of how it is reached. This makes the function to be learned simpler, potentially allowing learning of more complex actions. We use simple examples to illustrate our approach, and discuss how it could be extended to hierarchical architectures. (c) 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
查看更多>>摘要:It has been observed that design choices of neural networks are often crucial for their successful optimization. In this article, we therefore discuss the question if it is always possible to redesign a neural network so that it trains well with gradient descent. This yields the following universality result: If, for a given network, there is any algorithm that can find good network weights for a classification task, then there exists an extension of this network that reproduces the same forward model by mere gradient descent training. The construction is not intended for practical computations, but it provides some orientation on the possibilities of pre-trained networks in meta-learning and related approaches. (C)& nbsp;2022 Elsevier Ltd. All rights reserved.
Li, HongmingDou, RanKeil, AndreasPrincipe, Jose C....
19页
查看更多>>摘要:Inspired by the human vision system and learning, we propose a novel cognitive architecture that understands the content of raw videos in terms of objects without using labels. The architecture achieves four objectives: (1) Decomposing raw frames in objects by exploiting foveal vision and memory. (2) Describing the world by projecting objects on an internal canvas. (3) Extracting relevant objects from the canvas by analyzing the causal relation between objects and rewards. (4) Exploiting the information of relevant objects to facilitate the reinforcement learning (RL) process. In order to speed up learning, and better identify objects that produce rewards, the architecture implements learning by causality from the perspective of Wiener and Granger using object trajectories stored in working memory and the time series of external rewards. A novel non-parametric estimator of directed information using Renyi's entropy is designed and tested. Experiments on three environments show that our architecture extracts most of relevant objects. It can be thought of as 'understanding' the world in an object-oriented way. As a consequence, our architecture outperforms state-of-the-art deep reinforcement learning in terms of training speed and transfer learning. (C) 2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Building a human-like integrative artificial cognitive system, that is, an artificial general intelligence (AGI), is the holy grail of the artificial intelligence (AI) field. Furthermore, a computational model that enables an artificial system to achieve cognitive development will be an excellent reference for brain and cognitive science. This paper describes an approach to develop a cognitive architecture by integrating elemental cognitive modules to enable the training of the modules as a whole. This approach is based on two ideas: (1) brain-inspired AI, learning human brain architecture to build human-level intelligence, and (2) a probabilistic generative model (PGM)-based cognitive architecture to develop a cognitive system for developmental robots by integrating PGMs. The proposed development framework is called a whole brain PGM (WB-PGM), which differs fundamentally from existing cognitive architectures in that it can learn continuously through a system based on sensory-motor information.& nbsp;In this paper, we describe the rationale for WB-PGM, the current status of PGM-based elemental cognitive modules, their relationship with the human brain, the approach to the integration of the cognitive modules, and future challenges. Our findings can serve as a reference for brain studies. As PGMs describe explicit informational relationships between variables, WB-PGM provides interpretable guidance from computational sciences to brain science. By providing such information, researchers in neuroscience can provide feedback to researchers in AI and robotics on what the current models lack with reference to the brain. Further, it can facilitate collaboration among researchers in neuro-cognitive sciences as well as AI and robotics. (C)& nbsp;2022 The Author(s). Published by Elsevier Ltd.
查看更多>>摘要:This paper proposes a new hierarchical approach to learning rate adaptation in gradient methods, called learning rate optimization (LRO). LRO formulates the learning rate adaption problem as a hierarchical optimization problem that minimizes the loss function with respect to the learning rate for current model parameters and gradients. Then, LRO optimizes the learning rate based on the alternating direction method of multipliers (ADMM). In the process of this learning rate optimization, LRO does not require any second-order information and probabilistic model, so it is highly efficient. Furthermore, LRO does not require any additional hyperparameters when compared to the vanilla gradient method with the simple exponential learning rate decay. In the experiments, we integrated LRO with vanilla SGD and Adam. Then, we compared their optimization performance with the stateof-the-art learning rate adaptation methods and also the most commonly-used adaptive gradient methods. The SGD and Adam with LRO outperformed all the competitors on the benchmark datasets in image classification tasks. (c) 2022 The Author(s). Published by Elsevier Ltd.
查看更多>>摘要:Consider that the constrained convex optimization problems have emerged in a variety of scientific and engineering applications that often require efficient and fast solutions. Inspired by the Nesterov's accelerated method for solving unconstrained convex and strongly convex optimization problems, in this paper we propose two novel accelerated projection neurodynamic approaches for constrained smooth convex and strongly convex optimization based on the variational approach. First, for smooth, and convex optimization problems, a non-autonomous accelerated projection neurodynamic approach (NAAPNA) is presented and the existence, uniqueness and feasibility of the solution to it are analyzed rigorously. We provide that the NAAPNA has a convergence rate which is inversely proportional to the square of the running time. In addition, we present a novel autonomous accelerated projection neurodynamic approach (AAPNA) for addressing the constrained, smooth, strongly convex optimization problems and prove the existence, uniqueness to the strong global solution of AAPNA based on the Cauchy-Lipschitz-Picard theorem. Furthermore, we also prove the global convergence of AAPNA with different exponential convergence rates for different parameters. Compared with existing projection neurodynamic approaches based on the Brouwer's fixed point theorem, both NAAPNA and AAPNA use the projection operators of the auxiliary variable to map the primal variables to the constrained feasible region, thus our proposed neurodynamic approaches are easier to realize algorithm's acceleration. Finally, the effectiveness of NAAPNA and AAPNA is illustrated with several numerical examples. (C)& nbsp;2022 Published by Elsevier Ltd.
查看更多>>摘要:Deep Neural Networks (DNNs) have been vastly and successfully employed in various artificial intelligence and machine learning applications (e.g., image processing and natural language processing). As DNNs become deeper and enclose more filters per layer, they incur high computational costs and large memory consumption to preserve their large number of parameters. Moreover, present processing platforms (e.g., CPU, GPU, and FPGA) have not enough internal memory, and hence external memory storage is needed. Hence deploying DNNs on mobile applications is difficult, considering the limited storage space, computation power, energy supply, and real-time processing requirements. In this work, using a method based on tensor decomposition, network parameters were compressed, thereby reducing access to external memory. This compression method decomposes the network layers' weight tensor into a limited number of principal vectors such that (i) almost all the initial parameters can be retrieved, (ii) the network structure did not change, and (iii) the network quality after reproducing the parameters was almost similar to the original network in terms of detection accuracy. To optimize the realization of this method on FPGA, the tensor decomposition algorithm was modified while its convergence was not affected, and the reproduction of network parameters on FPGA was straightforward. The proposed algorithm reduced the parameters of ResNet50, VGG16, and VGG19 networks trained with Cifar10 and Cifar100 by almost 10 times. (C)& nbsp;2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:In this paper, a novel dual-channel system for multi-class text emotion recognition has been proposed, and a novel technique to explain its training & predictions has been developed. The architecture of the proposed system contains the embedding module, dual-channel module, emotion classification module, and explainability module. The embedding module extracts the textual features from the input sentences in the form of embedding vectors using the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model. Then the embedding vectors are fed as the inputs to the dual-channel network containing two network channels made up of convolutional neural network (CNN) and bidirectional long short term memory (BiLSTM) network. The intuition behind using CNN and BiLSTM in both the channels was to harness the goodness of the convolutional layer for feature extraction and the BiLSTM layer to extract text's order and sequence-related information. The outputs of both channels are in the form of embedding vectors which are concatenated and fed to the emotion classification module. The proposed system's architecture has been determined by thorough ablation studies, and a framework has been developed to discuss its computational cost. The emotion classification module learns and projects the emotion embeddings on a hyperplane in the form of clusters. The proposed explainability technique explains the training and predictions of the proposed system by analyzing the inter & intra-cluster distances and the intersection of these clusters. The proposed approach's consistent accuracy, precision, recall, and F1 score results for ISEAR, Aman, AffectiveText, and EmotionLines datasets, ensure its applicability to diverse texts.(C)& nbsp;& nbsp;2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:If left untreated, Alzheimer's disease (AD) is a leading cause of slowly progressive dementia. Therefore, it is critical to detect AD to prevent its progression. In this study, we propose a bidirectional progressive recurrent network with imputation (BiPro) that uses longitudinal data, including patient demographics and biomarkers of magnetic resonance imaging (MRI), to forecast clinical diagnoses and phenotypic measurements at multiple timepoints. To compensate for missing observations in the longitudinal data, we use an imputation module to inspect both temporal and multivariate relations associated with the mean and forward relations inherent in the time series data. To encode the imputed information, we define a modification of the long short-term memory (LSTM) cell by using a progressive module to compute the progression score of each biomarker between the given timepoint and the baseline through a negative exponential function. These features are used for the prediction task. The proposed system is an end-to-end deep recurrent network that can accomplish multiple tasks at the same time, including (1) imputing missing values, (2) forecasting phenotypic measurements, and (3) predicting the clinical status of a patient based on longitudinal data. We experimented on 1,335 participants from The Alzheimer's Disease Prediction of Longitudinal Evolution (TADPOLE) challenge cohort. The proposed method achieved a mean area under the receiver-operating characteristic curve (mAUC) of 78% for predicting the clinical status of patients, a mean absolute error (MAE) of 3.5ml for forecasting MRI biomarkers, and an MAE of 6.9ml for missing value imputation. The results confirm that our proposed model outperforms prevalent approaches, and can be used to minimize the progression of Alzheimer's disease.(C) 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).