计算机与网络2024,Vol.50Issue(6) :542-548.DOI:10.20149/j.cnki.issn1008-1739.2024.06.013

基于PVT模型分类任务的优化方法

Optimization Method for PVT Model Classification Task

赵志闯 古丽娜孜 帕孜来提 陈藜韦
计算机与网络2024,Vol.50Issue(6) :542-548.DOI:10.20149/j.cnki.issn1008-1739.2024.06.013

基于PVT模型分类任务的优化方法

Optimization Method for PVT Model Classification Task

赵志闯 1古丽娜孜 1帕孜来提 1陈藜韦1
扫码查看

作者信息

  • 1. 伊犁师范大学 网络安全与信息技术学院,新疆 伊宁 835012
  • 折叠

摘要

PVT模型是在Vision Transformer(VIT)的基础上进行改进的深度学习模型.不同于VIT的单一尺度处理,PVT引入了金字塔结构,旨在更全面地捕捉图像中的多尺度信息,以提高模型性能.为PVT引入了一种层级激活机制来提升PVT在分类任务的性能和鲁棒性.层级激活机制将饱和状态分配给层级,以减少由于输入变化而导致的激活输出在层级上的波动.为了评估优化模型的有效性,创建了一个专门的植物多源数据集,并将其转化为噪声图像,以更真实地模拟实际场景.分别在CIFAR10、InterImage和植物多源数据集上进行实验,分类任务的准确率均有一定的提升.

Abstract

The PVT model is a deep learning model that is improved based on Vision Transformer(VIT).Unlike the single-scale processing of VIT,a pyramid structure is introduced in PVT that aims to capture the multi-scale information in images more comprehensively,improving the model performance.A layered activation mechanism is brought in for PVT to enhance its performance and robustness in classification tasks.Saturation states are distributed to the layers by the mechanism to reduce the fluctuation of activation output on the layers due to input changes.In order to evaluate the effectiveness of the optimization model,a dedicated multi-source dataset of plants is created and transformed into noise images to more realistically simulate actual scenes.The experiments are conducted on CIFAR10,InterImage and the plant multi-source dataset respectively,and the accuracy of the classification task is improved in all cases.

关键词

图像分类/PVT/层级激活机制/多源数据集

Key words

image classification/PVT/layered activation mechanism/multi-source dataset

引用本文复制引用

出版年

2024
计算机与网络
工业和信息化部电子无线通信专业情报网

计算机与网络

CHSSCD
影响因子:0.149
ISSN:1008-1739
段落导航相关论文