基于PVT模型分类任务的优化方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：PVT模型是在Vision Transformer(VIT)的基础上进行改进的深度学习模型.不同于VIT的单一尺度处理,PVT引入了金字塔结构,旨在更全面地捕捉图像中的多尺度信息,以提高模型性能.为PVT引入了一种层级激活机制来提升PVT在分类任务的性能和鲁棒性.层级激活机制将饱和状态分配给层级,以减少由于输入变化而导致的激活输出在层级上的波动.为了评估优化模型的有效性,创建了一个专门的植物多源数据集,并将其转化为噪声图像,以更真实地模拟实际场景.分别在CIFAR10、InterImage和植物多源数据集上进行实验,分类任务的准确率均有一定的提升.

外文标题：Optimization Method for PVT Model Classification Task

外文摘要：The PVT model is a deep learning model that is improved based on Vision Transformer(VIT).Unlike the single-scale processing of VIT,a pyramid structure is introduced in PVT that aims to capture the multi-scale information in images more comprehensively,improving the model performance.A layered activation mechanism is brought in for PVT to enhance its performance and robustness in classification tasks.Saturation states are distributed to the layers by the mechanism to reduce the fluctuation of activation output on the layers due to input changes.In order to evaluate the effectiveness of the optimization model,a dedicated multi-source dataset of plants is created and transformed into noise images to more realistically simulate actual scenes.The experiments are conducted on CIFAR10,InterImage and the plant multi-source dataset respectively,and the accuracy of the classification task is improved in all cases.

外文关键词：

image classificationPVTlayered activation mechanismmulti-source dataset

作者：

赵志闯、古丽娜孜、帕孜来提、陈藜韦

展开 >

作者单位：

伊犁师范大学网络安全与信息技术学院,新疆伊宁 835012

关键词：

图像分类 PVT 层级激活机制多源数据集

出版年：

2024

DOI：

10.20149/j.cnki.issn1008-1739.2024.06.013

计算机与网络

工业和信息化部电子无线通信专业情报网

计算机与网络

CHSSCD

影响因子：0.149

ISSN：1008-1739

年,卷(期)：2024.50(6)