面向图像数据的ConvNeXt特征提取研究

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：卷积神经网络在计算机视觉任务中已取得诸多成果,无论是目标检测还是分割,都依赖于提取到的特征信息,一些模糊性的数据和物体形状各异等问题为特征提取带来了极大的挑战.传统的卷积结构只能学习到特征图相邻空间位置的上下文信息,无法对全局信息进行提取,而自注意力机制等模型虽具有更大的感受野和建立全局的依赖关系,但存在计算复杂度过高和需要大量数据等不足.为此,提出了一种CNN与LSTM结合的模型,该模型在增强局部感受野的前提下,可以更好地结合图像数据的全局信息.研究以主干网络ConvNeXt-T为基础模型,通过拼接不同大小卷积核以融合多尺度特征来解决物体形状各异的问题,并从水平和垂直两个方向聚合双向长短期记忆网络关注全局与局部信息的交互性.实验对公开访问的CI-FAR-10,CIFAR-100,Tiny ImageNet数据集进行图像分类任务,所提出的网络在3个数据集实验中相较于基础模型ConvNeXt-T在准确率上分别提高了3.18％,2.91％,1.03％.实验证明改进后的ConvNeXt-T网络相较于基础模型在参数量和准确性方面都有了大幅度提升,可提取到更加有效的特征信息.

外文标题：ConvNeXt Feature Extraction Study for Image Data

外文摘要：Convolutional neural networks have achieved many results in computer vision tasks,both in target detection and seg-mentation,which depend on the extracted feature information.Some problems such as ambiguous data and varying object shapes pose great challenges for feature extraction.The traditional convolutional structure can only learn the contextual information of the neighboring spatial locations of the feature map and cannot extract the global information,while models such as the self-atten-tive mechanism,although having a larger perceptual field and establishing global dependencies,are insufficient due to their high computational complexity and the need for large amounts of data.Therefore,this paper proposes a model combining CNN and LSTM,which can better combine the global information of image data while enhancing the local perceptual field.It uses the back-bone network ConvNeXt-T as the base model to solve the problem of different object shapes by splicing different size convolu-tional kernels to fuse multi-scale features,and aggregates two-way long and short-term memory networks from both horizontal and vertical directions.Focus on the interactivity of global and local information.Experiments are conducted on publicly accessible CIFAR-10,CIFAR-100,and Tiny ImageNet datasets for image classification tasks,and the accuracy of the proposed network im-proves 3.18％,2.91％,and 1.03％in the three datasets respectively,compared to the base model ConvNeXt-T.Experiments demonstrate that the improved ConvNeXt-T network has substantially improved the number of parameters and accuracy com-pared with the base model,and can extract more effective feature information.

外文关键词：

Feature extractionLocal receptive fieldConvNeXt-TMulti-scale featuresBidirectional long and short-term memory network

作者：

杨鹏跃、王锋、魏巍

展开 >

作者单位：

山西大学计算机与信息技术学院太原 030006

关键词：

特征提取局部感受野 ConvNeXt-T 多尺度特征双向长短期记忆网络

基金：

国家自然科学基金山西省回国留学人员科研项目

项目编号：

622761582021-007

出版年：

2024

DOI：

10.11896/jsjkx.230500196

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(z1)

参考文献量49