Neural Networks2022,Vol.1529.DOI:10.1016/j.neunet.2022.04.014

Sparse factorization of square matrices with application to neural attention modeling

Khalitov, Ruslan Yu, Tong Cheng, Lei
Neural Networks2022,Vol.1529.DOI:10.1016/j.neunet.2022.04.014

Sparse factorization of square matrices with application to neural attention modeling

Khalitov, Ruslan 1Yu, Tong 1Cheng, Lei1
扫码查看

作者信息

  • 1. Norwegian Univ Sci & Technol
  • 折叠

Abstract

Square matrices appear in many machine learning problems and models. Optimization over a large square matrix is expensive in memory and in time. Therefore an economic approximation is needed. Conventional approximation approaches factorize the square matrix into a number matrices of much lower ranks. However, the low-rank constraint is a performance bottleneck if the approximated matrix is intrinsically high-rank or close to full rank. In this paper, we propose to approximate a large square matrix with a product of sparse full-rank matrices. In the approximation, our method needs only N(log N)(2) non-zero numbers for an N x N full matrix. Our new method is especially useful for scalable neural attention modeling. Different from the conventional scaled dot-product attention methods, we train neural networks to map input data to the non-zero entries of the factorizing matrices. The sparse factorization method is tested for various square matrices, and the experimental results demonstrate that our method gives a better approximation when the approximated matrix is sparse and high rank. As an attention module, our new method defeats Transformer and its several variants for long sequences in synthetic data sets and in the Long Range Arena benchmarks. Our code is publicly available(2). (C) 2022 The Author(s) .Published by Elsevier Ltd.

Key words

Matrix factorization/Sparse/Neural networks/Attention modeling/NYSTROM METHOD

引用本文复制引用

出版年

2022
Neural Networks

Neural Networks

EISCI
ISSN:0893-6080
被引量2
参考文献量33
段落导航相关论文