基于快速最大奇异值幂正规化的全局协方差池化

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：近期的研究工作表明,矩阵正规化对全局协方差池化起着关键作用,有助于生成分辨能力更强的表征,从而提升图像识别任务的性能.在不同的矩阵正规化方法中,矩阵结构正规化能充分利用协方差矩阵的几何结构,因此可以获得更好的性能.然而,结构正规化一般依赖计算代价很高的奇异值分解(SVD)或者特征值分解(EIG),不能充分利用GPU的并行计算能力,从而形成计算瓶颈.迭代矩阵平方根正规化(iSQRT)通过牛顿-舒尔兹迭代对协方差矩阵进行正规化,速度比基于SVD和EIG的方法更快.但是随着迭代次数和维度的提高,iSQRT的时间和内存开销都会显著增加,而且该方法无法完成一般幂次的正规化,限制了其应用范围.为了弥补iSQRT的不足,文中提出了一种基于最大奇异值幂的协方差矩阵正规化方法.该方法通过将协方差矩阵除以其最大奇异值的幂来实现,计算过程仅需迭代幂法计算矩阵的最大奇异值.详细的消融实验的结果表明,与iSQRT相比,所提方法的速度更快并占用更少的显存,在时间复杂度和空间复杂度上都优于iSQRT方法,同时性能上与iSQRT方法相当或更好.所提方法在大规模图像分类数据库和细粒度识别数据库中取得了领先的性能,其中在Aircraft,Cars和Indoor67上分别表现为90.7％,93.3％以及83.9％,充分验证了所提方法的鲁棒性和泛化性.

外文标题：Global Covariance Pooling Based on Fast Maximum Singular Value Power Normalization

外文摘要：Recent research work shows that matrix normalization plays a key role in global covariance pooling,which helps to generate more discriminative representations,thus improving the performance of image recognition tasks.For different normaliza-tion methods,the matrix structure-wise normalization can make full use of the geometric structure of the covariance matrix,so it can obtain better performance.However,the structure-wise normalization generally depends on singular value decomposition(SVD)or eigenvalue decomposition(EIG)with high computational cost,which limits parallel computing ability of GPUs,beco-ming a computational bottleneck.Iterative matrix square root normalization(iSQRT)uses Newton-Schulz iteration to normalize the covariance matrix,which is faster than the methods based on SVD and EIG.However,with the increase of the number of itera-tions and dimensions,the time and memory of iSQRT will increase significantly,and this method cannot complete the normaliza-tion of general power,which limits its application scope.To solve the above problems,a covariance matrix normalization method based on the maximum singular value power is proposed by dividing the covariance matrix by the power of its maximum singular value which only depends on iterative power method to estimate the maximum singular value of the matrix.Detailed ablation ex-periments show that,compared with iSQRT,the proposed method is faster and occupies less memory,and is superior to iSQRT in terms of time complexity and space complexity,and its performance is comparable to or better than iSQRT.The proposed method has achieved state-of-the-art performance in large-scale image classification dataset and fine-grained visual recognition datasets,including Aircraft,Cars and Indoor67,where accuracy is 90.7％,93.3％and 83.9％respectively.The result fully demonstrates the robustness and generalization of the proposed method.

外文关键词：

Image classificationGlobal covariance poolingMatrix power normalizationMaximum singular value power normali-zation

作者：

曾睿仁、谢江涛、李培华

展开 >

作者单位：

大连理工大学信息与通信工程学院辽宁大连 116024

关键词：

图像分类全局协方差池化矩阵幂正规化最大奇异值幂正规化

基金：

国家自然科学基金

项目编号：

61971086

出版年：

2024

DOI：

10.11896/jsjkx.230200140

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(4)

参考文献量31