面向FT-M7002的阈值分割算法优化实现
OPTIMIZED IMPLEMENTATION OF THRESHOLD SEGMENTATION ALGORITHM FOR FT-M7002
陈云 1胡伟方 1王梦园 1商建东2
作者信息
- 1. 郑州大学信息工程学院 河南郑州 450000
- 2. 郑州大学河南省超级计算中心 河南郑州 450000
- 折叠
摘要
在国产高性能DSP的快速发展过程中,缺乏能充分发挥其体系结构优势的高性能图像处理算法.针对以上问题,对应用比较广泛的Otsu阈值分割算法进行面向FT平台的并行优化.在分析FT-M7002体系结构以及Otsu阈值分割算法的基础上,使用飞腾向量指令集进行手工向量化改写以充分利用FT-M7002平台超长向量寄存器,从而减少数据访存次数提高数据级并行性.在多种图像矩阵规模下进行性能测试,结果显示,阈值分割中的阈值比较模块优化后获得了 3.74~4.39倍的加速效果,Otsu阈值分割算法总体优化实现获得了 1.77~1.87倍的加速效果.
Abstract
In the rapid development of domestic high-performance DSP,there is a lack of high-performance image processing algorithm which can give full play to its architecture advantages.To solve the above problems,the widely-used Otsu threshold segmentation algorithm is optimized for FT platform.Based on the analysis of FT-M7002 architecture and Otsu threshold segmentation algorithm,this paper used Feiteng vector instruction set for manual vectorization rewriting to make full use of FT-M7002 platform ultra long vector register,so as to reduce the number of data access and improve data level parallelism.The performance test was carried out under various image matrix scales.The results show that the optimized threshold comparison module achieves 3.74~4.39 times acceleration effect,and the optimized Otsu threshold segmentation algorithm achieves 1.77~1.87 times acceleration effect.
关键词
FT-M7002/Otsu阈值分割/手工向量化/循环展开/数据级并行Key words
FT-M7002/Otsu threshold segmentation/Manual vectorization/Loop unrolling/Data level parallelism引用本文复制引用
基金项目
国家重点研发计划子课题(2018YFB0505000)
出版年
2024