光电子·激光2024,Vol.35Issue(8) :785-792.DOI:10.16136/j.joel.2024.08.0858

基于ViT和多任务自监督学习的图像质量评价

Image quality assessment based on ViT and multi-task self-super-vised learning

王华成 桑庆兵 胡聪
光电子·激光2024,Vol.35Issue(8) :785-792.DOI:10.16136/j.joel.2024.08.0858

基于ViT和多任务自监督学习的图像质量评价

Image quality assessment based on ViT and multi-task self-super-vised learning

王华成 1桑庆兵 1胡聪1
扫码查看

作者信息

  • 1. 江南大学人工智能与计算机学院,江苏无锡 214122
  • 折叠

摘要

针对现有的基于深度学习的图像质量评价方法,因为标注数据不足而存在的过拟合与泛化性能不足的问题,提出了一种基于多任务自监督学习的图像质量评价方法.首先,通过算法合成17种失真类型图像,并以全参考MDSI(mean deviation similarity index)得分和失真类型作为合成失真图像的2个标签;随后,在ViT(vision transformer)上进行预测MDSI得分和失真类型的多任务自监督学习;最后,将训练得到的模型在下游任务上进行微调,将上游任务学习到的语义特征迁移到下游任务.将本文方法与主流无参考图像质量评价(no reference image quality assessment,NR-IQA)方法在多个公开的图像质量评价数据集上进行了充分比较,在LIVE、CSIQ、TID2013以及CID2013等数据集上的测试结果相比于表现最好的算法均提升了1-2个百分点,这表明提出的算法优于大多数主流的NR-IQA算法.

Abstract

An image quality assessment method based on multi-task self-supervised learning is proposed to address the existing deep learning-based image quality assessment methods,which suffer from overfitting and insufficient generalization performance due to insufficient labeled data.First,17 distortion type images are synthesized by the algorithm and the full reference mean deviation similarity index(MDSI)score and distortion type are used as 2 labels for the synthesized distortion images.Subsequently,multi-task self-supervised learning on vision transformer(ViT)for predicting MDSI scores and distortion types.Finally,the trained model is fine-tuned on the downstream task to migrate the semantic features learned from the upstream task to the downstream task.The method in this paper is fully compared with mainstream no reference image quality assessment(NR-IQA)methods on several publicly available image quality assessment datasets,and the test results on LIVE,CSIQ,TID2013,and CID2013 are all improved by 1 to 2 percentage points compared with the best performing algorithms,which indicates that the proposed algorithm outperforms most mainstream unreferenced image quality assessment algorithms.

关键词

图像质量评价/无参考/多任务学习/自监督学习/vision/transformer(ViT)

Key words

image quality assessment/no-reference/multi-task learning/self-supervised learning/vision transformer(ViT)

引用本文复制引用

基金项目

国家自然科学基金(62006097)

江苏省自然科学基金(BK20200593)

出版年

2024
光电子·激光
天津理工大学 中国光学学会

光电子·激光

北大核心
影响因子:1.437
ISSN:1005-0086
段落导航相关论文