查看更多>>摘要:? 2021 Elsevier LtdLearning to interact with the environment not only empowers the agent with manipulation capability but also generates information to facilitate building of action understanding and imitation capabilities. This seems to be a strategy adopted by biological systems, in particular primates, as evidenced by the existence of mirror neurons that seem to be involved in multi-modal action understanding. How to benefit from the interaction experience of the robots to enable understanding actions and goals of other agents is still a challenging question. In this study, we propose a novel method, deep modality blending networks (DMBN), that creates a common latent space from multi-modal experience of a robot by blending multi-modal signals with a stochastic weighting mechanism. We show for the first time that deep learning, when combined with a novel modality blending scheme, can facilitate action recognition and produce structures to sustain anatomical and effect-based imitation capabilities. Our proposed system, which is based on conditional neural processes, can be conditioned on any desired sensory/motor value at any time step, and can generate a complete multi-modal trajectory consistent with the desired conditioning in one-shot by querying the network for all the sampled time points in parallel avoiding the accumulation of prediction errors. Based on simulation experiments with an arm-gripper robot and an RGB camera, we showed that DMBN could make accurate predictions about any missing modality (camera or joint angles) given the available ones outperforming recent multimodal variational autoencoder models in terms of long-horizon high-dimensional trajectory predictions. We further showed that given desired images from different perspectives, i.e. images generated by the observation of other robots placed on different sides of the table, our system could generate image and joint angle sequences that correspond to either anatomical or effect-based imitation behavior. To achieve this mirror-like behavior, our system does not perform a pixel-based template matching but rather benefits from and relies on the common latent space constructed by using both joint and image modalities, as shown by additional experiments. Moreover, we showed that mirror learning (in our system) does not only depend on visual experience and cannot be achieved without proprioceptive experience. Our experiments showed that out of ten training scenarios with different initial configurations, the proposed DMBN model could achieve mirror learning in all of the cases where the model that only uses visual information failed in half of them. Overall, the proposed DMBN architecture not only serves as a computational model for sustaining mirror neuron-like capabilities, but also stands as a powerful machine learning architecture for high-dimensional multi-modal temporal data with robust retrieval capabilities operating with partial information in one or multiple modalities.
查看更多>>摘要:? 2021 Elsevier LtdShannon's entropy or an extension of Shannon's entropy can be used to quantify information transmission between or among variables. Mutual information is the pair-wise information that captures nonlinear relationships between variables. It is more robust than linear correlation methods. Beyond mutual information, two generalizations are defined for multivariate distributions: interaction information or co-information and total correlation or multi-mutual information. In comparison to mutual information, interaction information and total correlation are underutilized and poorly studied in applied neuroscience research. Quantifying information flow between brain regions is not explicitly explained in neuroscience by interaction information and total correlation. This article aims to clarify the distinctions between the neuroscience concepts of mutual information, interaction information, and total correlation. Additionally, we proposed a novel method for determining the interaction information between three variables using total correlation and conditional mutual information. On the other hand, how to apply it properly in practical situations. We supplied both simulation experiments and real neural studies to estimate functional connectivity in the brain with the above three higher-order information-theoretic approaches. In order to capture redundancy information for multivariate variables, we discovered that interaction information and total correlation were both robust, and it could be able to capture both well-known and yet-to-be-discovered functional brain connections.
查看更多>>摘要:? 2021 Elsevier LtdThis paper presents an inertial neural network to solve the source localization optimization problem with l1-norm objective function based on the time of arrival (TOA) localization technique. The convergence and stability of the inertial neural network are analyzed by the Lyapunov function method. An inertial neural network iterative approach is further used to find a better solution among the solutions with different inertial parameters. Furthermore, the clock asynchronization is considered in the TOA l1-norm model for more general real applications, and the corresponding inertial neural network iterative approach is addressed. The numerical simulations and real data are both considered in the experiments. In the simulation experiments, the noise contains uncorrelated zero-mean Gaussian noise and uniform distributed outliers. In the real experiments, the data is obtained by using the ultra wide band (UWB) technology hardware modules. Whether or not there is clock asynchronization, the results show that the proposed approach always can find a more accurate source position compared with some of the existing algorithms, which implies that the proposed approach is more effective than the compared ones.
查看更多>>摘要:? 2021 Elsevier LtdThis paper focuses on improving the spatial resolution of the hyperspectral image (HSI) by taking the prior information into consideration. In recent years, single HSI super-resolution methods based on deep learning have achieved good performance. However, most of them only simply apply general image super-resolution deep networks to hyperspectral data, thus ignoring some specific characteristics of hyperspectral data itself. In order to make full use of spectral information of the HSI, we transform the HSI SR problem from the image domain into the abundance domain by the dilated projection correction network with an autoencoder, termed as aeDPCN. In particular, we first encode the low-resolution HSI to abundance representation and preserve the spectral information in the decoder network, which could largely reduce the computational complexity. Then, to enhance the spatial resolution of the abundance embedding, we super-resolve the embedding in a coarse-to-fine manner by the dilated projection correction network where the back-projection strategy is introduced to further eliminate spectral distortion. Finally, the predictive images are derived by the same decoder, which increases the stability of our method, even at a large upscaling factor. Extensive experiments on real hyperspectral image scenes demonstrate the superiority of our method over the state-of-the-art, in terms of accuracy and efficiency.
查看更多>>摘要:? 2021 Elsevier LtdDense video captioning aims to automatically describe several events that occur in a given video, which most state-of-the-art models accomplish by locating and describing multiple events in an untrimmed video. Despite much progress in this area, most current approaches only encode visual features in the event location phase and they neglect the relationships between events, which may degrade the consistency of the description in the identical video. Thus, in the present study, we attempted to exploit visual–audio cues to generate event proposals and enhance event-level representations by capturing their temporal and semantic relationships. Furthermore, to compensate for the major limitation of not fully utilizing multimodal information in the description process, we developed an attention-gating mechanism that dynamically fuses and regulates the multi-modal information. In summary, we propose an event-centric multi-modal fusion approach for dense video captioning (EMVC) to capture the relationships between events and effectively fuse multi-modal information. We conducted comprehensive experiments to evaluate the performance of EMVC based on the benchmark ActivityNet Caption and YouCook2 data sets. The experimental results showed that our model achieved impressive performance compared with state-of-the-art methods.
查看更多>>摘要:? 2021 The Author(s)Modifying the existing models of classifiers’ operation is primarily aimed at increasing the effectiveness as well as minimizing the training time. An additional advantage is the ability to quickly implement a given solution to the real needs of the market. In this paper, we propose a method that can implement various classifiers using the federated learning concept and taking into account parallelism. Also, an important element is the analysis and selection of the best classifier depending on its reliability found for separated datasets extended by new, augmented samples. The proposed augmentation technique involves image processing techniques, neural architectures, and heuristic methods and improves the operation in federated learning by increasing the role of the server. The proposition has been presented and tested for the fruit image classification problem. The conducted experiments have shown that the described technique can be very useful as an implementation method even in the case of a small database. Obtained results are discussed concerning the advantages and disadvantages in the context of practical application like higher accuracy.
查看更多>>摘要:? 2021 Elsevier LtdThis work investigates the stability and dissipativity problems for neural networks with time-varying delay. By the construction of new augmented Lyapunov–Krasovskii functionals based on integral inequality and the use of zero equality approach, three improved results are proposed in the forms of linear matrix inequalities. And, based on the stability results, the dissipativity analysis for NNs with time-varying delays was investigated. Through some numerical examples, the superiority and effectiveness of the proposed results are shown by comparing the existing works.
查看更多>>摘要:? 2021 Elsevier LtdDeep neural networks unlocked a vast range of new applications by solving tasks of which many were previously deemed as reserved to higher human intelligence. One of the developments enabling this success was a boost in computing power provided by special purpose hardware, such as graphic or tensor processing units. However, these do not leverage fundamental features of neural networks like parallelism and analog state variables. Instead, they emulate neural networks relying on binary computing, which results in unsustainable energy consumption and comparatively low speed. Fully parallel and analogue hardware promises to overcome these challenges, yet the impact of analogue neuron noise and its propagation, i.e. accumulation, threatens rendering such approaches inept. Here, we determine for the first time the propagation of noise in deep neural networks comprising noisy nonlinear neurons in trained fully connected layers. We study additive and multiplicative as well as correlated and uncorrelated noise, and develop analytical methods that predict the noise level in any layer of symmetric deep neural networks or deep neural networks trained with back propagation. We find that noise accumulation is generally bound, and adding additional network layers does not worsen the signal to noise ratio beyond a limit. Most importantly, noise accumulation can be suppressed entirely when neuron activation functions have a slope smaller than unity. We therefore developed the framework for noise in fully connected deep neural networks implemented in analog systems, and identify criteria allowing engineers to design noise-resilient novel neural network hardware.
查看更多>>摘要:? 2021 The Author(s)Graph construction plays an essential role in graph-based label propagation since graphs give some information on the structure of the data manifold. While most graph construction methods rely on predefined distance calculation, recent algorithms merge the task of label propagation and graph construction in a single process. Moreover, the use of several descriptors is proved to outperform a single descriptor in representing the relation between the nodes. In this article, we propose a Multiple-View Consistent Graph construction and Label propagation algorithm (MVCGL) that simultaneously constructs a consistent graph based on several descriptors and performs label propagation over unlabeled samples. Furthermore, it provides a mapping function from the feature space to the label space with which we estimate the label of unseen samples via a linear projection. The constructed graph does not rely on a predefined similarity function and exploits data and label smoothness. Experiments conducted on three face and one handwritten digit databases show that the proposed method can gain better performance compared to other graph construction and label propagation methods.
查看更多>>摘要:? 2021 The Author(s)In this work, we introduce, justify and demonstrate the Corrective Source Term Approach (CoSTA)—a novel approach to Hybrid Analysis and Modeling (HAM). The objective of HAM is to combine physics-based modeling (PBM) and data-driven modeling (DDM) to create generalizable, trustworthy, accurate, computationally efficient and self-evolving models. CoSTA achieves this objective by augmenting the governing equation of a PBM model with a corrective source term generated using a deep neural network. In a series of numerical experiments on one-dimensional heat diffusion, CoSTA is found to outperform comparable DDM and PBM models in terms of accuracy – often reducing predictive errors by several orders of magnitude – while also generalizing better than pure DDM. Due to its flexible but solid theoretical foundation, CoSTA provides a modular framework for leveraging novel developments within both PBM and DDM. Its theoretical foundation also ensures that CoSTA can be used to model any system governed by (deterministic) partial differential equations. Moreover, CoSTA facilitates interpretation of the DNN-generated source term within the context of PBM, which results in improved explainability of the DNN. These factors make CoSTA a potential door-opener for data-driven techniques to enter high-stakes applications previously reserved for pure PBM.