首页|DCT-YOLOv5:从频率角度设计目标检测算法

DCT-YOLOv5:从频率角度设计目标检测算法

扫码查看
离散余弦变换(DCT)是JPEG压缩算法的核心步骤之一,将图像空间域的像素数据转换为频率域的系数。DCT与深度学习结合的算法非常常见,但并未从频率角度解析卷积结构。为进一步提升目标检测性能,针对该问题提出改进算法:DCT-YOLOv5。首先,证明卷积神经网络(CNNs)、Transformer和MLP架构都是对频域的隐式建模,验证以往模型设计的默认原则:有效感受野总小于理论感受野、多个小卷积核优于大卷积核。其次,考虑输入通道和卷积核选择合理的输出通道数,做到近似无损变换,其中下采样阶段是唯一改变通道数的地方。最后,通过固定参数比较DCT和卷积,二者差异稳定在±0。8%。并且为了最大程度降低计算量,引入固定组内数量的分组卷积。该模型以YOLOv5 为基准,在COCO数据集上设计了丰富实验,验证方法的有效性。取得了28。9%的mAP@。5 和277。8 的FPS,相对于基准模型获得了1。3%的相对提升。测试结果表明,改进后的模型在精度上有显著提升,并能够在更低的算力平台上运行。
DCT-YOLOv5:Designing Object Detection Algorithms from a Frequency Perspective
Discrete cosine transform(DCT)is one of the core steps of JPEG compression algorithm,which converts pixel data in the spatial domain of image into coefficients in the frequency domain.Algorithms that combine DCT with deep learning are largely common,but do not resolve the convolutional structures from the frequency perspective.To further improve the performance of object detection,we propose an improved algorithm for this problem:DCT-YOLOv5.First,it is shown that convolutional neural networks(CNNs),Transformers,and MLP architectures all implicitly model the frequency domain,validating previous standard model design principles:the effective perceptual field is always smaller than the theoretical perceptual field,and multiple small convolutional kernel is preferred to a large convolutional kernel.Second,the input channels and the convolution kernel are considered to choose a reasonable number of output channels to achieve an approximate lossless transformation,where the only place to change the number of channels is at the down-sampling stage.Finally,by comparing DCT and convolution with fixed parameters,the difference between the two is stabilized within±0.8%.And to minimize the computation,grouped convolution with a fixed number of in-groups is introduced.The model is benchmarked with YOLOv5,and enriched experiments are designed on the COCO2017 dataset to validate the effectiveness of the proposed method.Theresultshowsa detection speed of 277.8 FPS and a mAP@.5 of 28.9%,achieving a relative improvement of 1.3%over the benchmark model.The test results indicate that the enhanced model has significantly improved accuracy and can operate on lower computing platforms.

discrete cosine transformconvolutional neural networksdown-samplingfixed parameterYOLOv5

王涛、张笃振

展开 >

江苏师范大学 计算机科学与技术学院,江苏 徐州 221116

离散余弦变换 卷积神经网络 下采样 固定参数 YOLOv5

江苏省高等学校自然科学研究面上项目

19KJB520032

2024

计算机技术与发展
陕西省计算机学会

计算机技术与发展

CSTPCD
影响因子:0.621
ISSN:1673-629X
年,卷(期):2024.34(10)