首页期刊导航|Neural Networks
期刊信息/Journal information
Neural Networks
Pergamon Press
Neural Networks

Pergamon Press

0893-6080

Neural Networks/Journal Neural NetworksSCIAHCIEIISTP
正式出版
收录年代

    Joint learning adaptive metric and optimal classification hyperplane

    Wang, YidanYang, Liming
    10页
    查看更多>>摘要:Metric learning has attracted a lot of interest in classification tasks due to its efficient performance. Most traditional metric learning methods are based on k-nearest neighbors (kNN) classifiers to make decisions, while the choice k affects the generalization. In this work, we propose an end-to-end metric learning framework. Specifically, a new linear metric learning (LMML) is first proposed to jointly learn adaptive metrics and the optimal classification hyperplanes, where dissimilar samples are separated by maximizing classification margin. Then a nonlinear metric learning model (called RLMML) is developed based on a bound nonlinear kernel function to extend LMML. The non-convexity of the proposed models makes them difficult to optimize. The half-quadratic optimization algorithms are developed to solve iteratively the problems, by which the optimal classification hyperplane and adaptive metric are alternatively optimized. Moreover, the resulting algorithms are proved to be convergent theoretically. Numerical experiments on different types of data sets show the effectiveness of the proposed algorithms. Finally, the Wilcoxon test shows also the feasibility and effectiveness of the proposed models. (C)& nbsp;2022 Elsevier Ltd. All rights reserved.

    On minimal representations of shallow ReLU networks

    Dereich, SteffenKassing, Sebastian
    8页
    查看更多>>摘要:The realization function of a shallow ReLU network is a continuous and piecewise affine function f : R-d ->& nbsp;R, where the domain Rd is partitioned by a set of n hyperplanes into cells on which f is affine. We show that the minimal representation for f uses either n, n + 1 or n + 2 neurons and we characterize each of the three cases. In the particular case, where the input layer is one-dimensional, minimal representations always use at most n+1 neurons but in all higher dimensional settings there are functions for which n+2 neurons are needed. Then we show that the set of minimal networks representing f forms a C-infinity-submanifold M and we derive the dimension and the number of connected components of M. Additionally, we give a criterion for the hyperplanes that guarantees that a continuous, piecewise affine function is the realization function of an appropriate shallow ReLU network.(c) 2022 Elsevier Ltd. All rights reserved.

    Dual Global Enhanced Transformer for image captioning

    Xian, TiantaoLi, ZhixinZhang, CanlongMa, Huifang...
    13页
    查看更多>>摘要:Transformer-based architectures have shown great success in image captioning, where self-attention module can model source and target interaction (e.g., object-to-object, object-to-word, word-to-word). However, the global information is not explicitly considered in the attention weight calculation, which is essential to understand the scene content. In this paper, we propose Dual Global Enhanced Transformer (DGET) to incorporate global information in the encoding and decoding stages. Concretely, in DGET, we regard the grid feature as the visual global information and adaptively fuse it into region features in each layer by a novel Global Enhanced Encoder (GEE). During decoding, we proposed Global Enhanced Decoder (GED) to explicitly utilize the textual global information. First, we devise the context encoder to encode the existing caption generated by classic captioner as a context vector. Then, we use the context vector to guide the decoder to generate accurate words at each time step. To validate our model, we conduct extensive experiments on the MS COCO image captioning dataset and achieve superior performance over many state-of-the-art methods.(c) 2022 Elsevier Ltd. All rights reserved.

    Signed network representation with novel node proximity evaluation

    Xu, PinghuaHu, WenbinWu, JiaLiu, Weiwei...
    13页
    查看更多>>摘要:Currently, signed network representation has been applied to many fields, e.g., recommendation platforms. A mainstream paradigm of network representation is to map nodes onto a low-dimensional space, such that the node proximity of interest can be preserved. Thus, a key aspect is the node proximity evaluation. Accordingly, three new node proximity metrics were proposed in this study, based on the rigorous theoretical investigation on a new distance metric signed average first passage time (SAFT). SAFT derives from a basic random-walk quantity for unsigned networks and can capture high-order network structure and edge signs. We conducted network representation using the proposed proximity metrics and empirically exhibited our advantage in solving two downstream tasks - sign prediction and link prediction. The code is publicly available. (C)& nbsp;2022 Elsevier Ltd. All rights reserved.

    Low-degree term first in ResNet, its variants and the whole neural network family

    Sun, TongfengDing, ShifeiGuo, Lili
    11页
    查看更多>>摘要:To explain the working mechanism of ResNet and its variants, this paper proposes a novel argument of shallow subnetwork first (SSF), essentially low-degree term first (LDTF), which also applies to the whole neural network family. A neural network with shortcut connections behaves as an ensemble of a number of subnetworks of differing depths. Among the subnetworks, the shallow subnetworks are trained firstly, having great effects on the performance of the neural network. The shallow subnetworks roughly correspond to low-degree polynomials, while the deep subnetworks are opposite. Based on Taylor expansion, SSF is consistent with LDTF. ResNet is in line with Taylor expansion: shallow subnetworks are trained firstly to keep low-degree terms, avoiding overfitting; deep subnetworks try to maintain high-degree terms, ensuring high description capacity. Experiments on ResNets and DenseNets show that shallow subnetworks are trained firstly and play important roles in the training of the networks. The experiments also reveal the reason why DenseNets outperform ResNets: The subnetworks playing vital roles in the training of the former are shallower than those in the training of the latter. Furthermore, LDTF can also be used to explain the working mechanism of other ResNet variants (SE-ResNets and SK-ResNets), and the common phenomena occurring in many neural networks. (C)& nbsp;& nbsp;2022 Elsevier Ltd. All rights reserved.

    Not every sample is efficient: Analogical generative adversarial network for unpaired image-to-image translation

    Zheng, ZiqiangYang, JieYu, ZhibinWang, Yubo...
    10页
    查看更多>>摘要:Image translation is to learn an effective mapping function that aims to convert an image from a source domain to another target domain. With the proposal and further developments of generative adversarial networks (GANs), the generative models have achieved great breakthroughs. The image-to -image (I2I) translation methods can mainly fall into two categories: Paired and Unpaired. The former paired methods usually require a large amount of input-output sample pairs to perform one-side image translation, which heavily limits its practicability. To address the lack of the paired samples, CycleGAN and its extensions utilize the cycle-consistency loss to provide an elegant and generic solution to perform the unpaired I2I translation between two domains based on unpaired data. This thread of dual learning-based methods usually adopts the random sampling strategy for optimizing and does not consider the content similarity between samples. However, not every sample is efficient and effective for the desired optimization and leads to optimal convergence. Inspired by analogical learning, which is to utilize the relationships and similarities between sample observations, we propose a novel generic metric-based sampling strategy to effectively select samples from different domains for training. Besides, we introduce a novel analogical adversarial loss to force the model to learn from the effective samples and alleviate the influence of the negative samples. Experimental results on various vision tasks have demonstrated the superior performance of the proposed method. The proposed method is also a generic framework that can be easily extended to other I2I translation methods and result in a performance gain. (c) 2022 Elsevier Ltd. All rights reserved.

    Cross-modal distribution alignment embedding network for generalized zero-shot learning

    Li, QinHou, MingzhenLai, HongYang, Ming...
    7页
    查看更多>>摘要:Many approaches in generalized zero-shot learning (GZSL) rely on cross-modal mapping between the image feature space and the class embedding space, which achieves knowledge transfer from seen to unseen classes. However, these two spaces are completely different space and their manifolds are inconsistent, the existing methods suffer from highly overlapped semantic description of different classes, as in GZSL tasks unseen classes can be easily misclassified into seen classes. To handle these problems, we adopt a novel semantic embedding network which helps to encode more discriminative information from initial semantic attributes to semantic embeddings in visual space. Meanwhile, a distribution alignment constraint is adopted to help keep the distribution of the learned semantic embeddings consistent with the distribution of real image features. Moreover, an auxiliary classifier is adopted to strengthen the quality of the learned semantic embeddings. Finally, a relation network is used to classify the unseen images by computing the relation scores between the semantic embeddings and image features, which is much more flexible than the fixed distance metric functions. Experimental results demonstrate that our proposed method is superior to other state-of-the-arts. (C)& nbsp;2022 Published by Elsevier Ltd.

    Deep Rival Penalized Competitive Learning for low-resolution face recognition

    Li, PeiyingTu, ShikuiXu, Lei
    11页
    查看更多>>摘要:Current face recognition tasks are usually carried out on high-quality face images, but in reality, most face images are captured under unconstrained or poor conditions, e.g., by video surveillance. Existing methods are featured by learning data uncertainty to avoid overfitting the noise, or by adding margins to the angle or cosine space of the normalized softmax loss to penalize the target logit, which enforces intra-class compactness and inter-class discrepancy. In this paper, we propose a deep Rival Penalized Competitive Learning (RPCL) for deep face recognition in low-resolution (LR) images. Inspired by the idea of the RPCL, our method further enforces regulation on the rival logit, which is defined as the largest non-target logit for an input image. Different from existing methods that only consider penalization on the target logit, our method not only strengthens the learning towards the target label, but also enforces a reverse direction, i.e., becoming de-learning, away from the rival label. Comprehensive experiments demonstrate that our method improves the existing state-of-the-art methods to be very robust for LR face recognition. (C) 2022 Elsevier Ltd. All rights reserved.

    Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing

    Mi, ChenggangXie, LeiZhang, Yanning
    12页
    查看更多>>摘要:High quality end-to-end speech translation model relies on a large scale of speech-to-text training data, which is usually scarce or even unavailable for some low-resource language pairs. To overcome this, we propose a target-side data augmentation method for low-resource language speech translation. In particular, we first generate large-scale target-side paraphrases based on a paraphrase generation model which incorporates several statistical machine translation (SMT) features and the commonly used recurrent neural network (RNN) feature. Then, a filtering model which consists of semantic similarity and speech-word pair co-occurrence was proposed to select the highest scoring source speech-target paraphrase pairs from candidates. Experimental results on English, Arabic, German, Latvian, Estonian, Slovenian and Swedish paraphrase generation show that the proposed method achieves significant and consistent improvements over several strong baseline models on PPDB datasets (http://paraphrase. org/). To introduce the results of paraphrase generation into the low-resource speech translation, we propose two strategies: audio-text pairs recombination and multiple references training. Experimental results show that the speech translation models trained on new audio-text datasets which combines the paraphrase generation results lead to substantial improvements over baselines, especially on low-resource languages. (C)& nbsp;2022 Elsevier Ltd. All rights reserved.

    GARAT: Generative Adversarial Learning for Robust and Accurate Tracking

    Yao, BowenLi, JingXue, ShanWu, Jia...
    13页
    查看更多>>摘要:Object tracking by the Siamese network has gained its popularity for its outstanding performance and considerable potential. However, most of the existing Siamese architectures are faced with great difficulties when it comes to the scenes where the target is going through dramatic shape or environmental changes. In this work, we proposed a novel and concise generative adversarial learning method to solve the problem especially when the target is going under drastic changes of appearance, illumination variations and background clutters. We consider the above situations as distractors for tracking and joint a distractor generator into the traditional Siamese network. The component can simulate these distractors, and more robust tracking performance is achieved by eliminating the distractors from the input instance search image. Besides, we use the generalized intersection over union (GIoU) as our training loss. GIoU is a more strict metric for the bounding box regression compared to the traditional IoU, which can be used as training loss for more accurate tracking results. Experiments on five challenging benchmarks have shown favorable and state-of-the-art results against other trackers in different aspects. (C)& nbsp;2022 Elsevier Ltd. All rights reserved.