基于深度学习的弱监督语义分割方法综述

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：语义分割是计算机视觉领域的基本任务,旨在为每个像素分配语义类别标签,实现对图像的像素级理解.得益于深度学习的发展,基于深度学习的全监督语义分割方法取得了巨大进展.然而,这些方法往往需要大量带有像素级标注的训练数据,标注成本巨大,限制了其在诸如自动驾驶、医学图像分析以及工业控制等实际场景中的应用.为了降低数据的标注成本并进一步拓宽语义分割的应用场景,研究者们越来越关注基于深度学习的弱监督语义分割方法,希望通过诸如图像级标注、最小包围盒标注、线标注和点标注等弱标注信息实现图像的像素级分割预测.首先对语义分割任务进行了简要介绍,并分析了全监督语义分割所面临的困境,从而引出弱监督语义分割.然后,介绍了相关数据集和评估指标.接着,根据弱标注的类型和受关注程度,从图像级标注、其他弱标注以及大模型辅助这3个方面回顾和讨论了弱监督语义分割的研究进展.其中,第2类弱监督语义分割方法包括基于最小包围盒、线和点标注的弱监督语义分割.最后,分析了弱监督语义分割领域存在的问题与挑战,并就其未来可能的研究方向提出建议,旨在进一步推动弱监督语义分割领域研究的发展.

外文标题：Weakly supervised semantic segmentation based on deep learning

外文摘要：Semantic segmentation is an important and fundamental task in the field of computer vision.Its goal is to assign a semantic category label to each pixel in an image,achieving pixel-level understanding.It has wide applications in areas,such as autonomous driving,virtual reality,and medical image analysis.Given the development of deep learning in recent years,remarkable progress has been achieved in fully supervised semantic segmentation,which requires a large amount of training data with pixel-level annotations.However,accurate pixel-level annotations are difficult to provide because it sac-rifices substantial time,money,and human-label resources,thus limiting their widespread application in reality.To reduce the cost of annotating data and further expand the application scenarios of semantic segmentation,researchers are paying increasing attention to weakly supervised semantic segmentation(WSSS)based on deep learning.The goal is to develop a semantic segmentation model that utilizes weak annotations information instead of dense pixel-level annotations to predict pixel-level segmentation accurately.Weak annotations mainly include image-level,bounding-box,scribble,and point annotations.The key problem in WSSS lies in how to find a way to utilize the limited annotation information,incorpo-rate appropriate training strategies,and design powerful models to bridge the gap between weak supervision and pixel-level annotations.This study aims to classify and summarize WSSS methods based on deep learning,analyze the challenges and problems encountered by recent methods,and provide insights into future research directions.First,we introduce WSSS as a solution to the limitations of fully supervised semantic segmentation.Second,we introduce the related datasets and evalu-ation metrics.Third,we review and discuss the research progress of WSSS from three categories:image-level annotations,other weak annotations,and assistance from large-scale models,where the second category includes bounding-box,scribble,and point annotations.Specifically,image-level annotations only provide object categories information contained in the image,without specifying the positions of the target objects.Existing methods always follow a two-stage training pro-cess:producing a class activation map(CAM),also known as initial seed regions used to generate high-quality pixel-level pseudo labels;and training a fully supervised semantic segmentation model using the produced pixel-level pseudo labels.According to whether the pixel-level pseudo labels are updated or not during the training process in the second stage,WSSS based on image-level annotations can be further divided into offline and online approaches.For offline approaches,existing research treats two stages independently,where the initial seed regions are optimized to obtain more reliable pixel-level pseudo labels that remain unchanged throughout the second stage.They are often divided into six classes according to dif-ferent optimization strategies,including the ensemble of CAM,image erasing,co-occurrence relationship decoupling,affinity propagation,additional supervised information,and self-supervised learning.For online approaches,the pixel-level pseudo labels keep updating during the entire training process in the second stage.The production of pixel-level pseudo labels and the semantic segmentation model are jointly optimized.The online counterparts can be trained end to end,making the training process more efficient.Compared with image-level annotations,other weak annotations,includ-ing bounding box,scribble,and point,are more powerful supervised signals.Among them,bounding-box annotations not only provide object category labels but also include information of object positions.The regions outside the bounding-box are always considered background,while box regions simultaneously contain foreground and background areas.Therefore,for bounding-box annotations,existing research mainly starts from accurately distinguishing foreground areas from back-ground regions within the bounding-box,thereby producing more accurate pixel-level pseudo labels,used for training fol-lowing semantic segmentation networks.Scribble and point annotations not only indicate the categories of objects contained in the image but also provide local positional information of the target objects.For scribble annotations,more complete pseudo labels can be produced to supervise semantic segmentation by inferring the category of unlabeled regions from the annotated scribble.For point annotations,the associated semantic information is expanded to the entire image through label propagation,distance metric learning,and loss function optimization.In addition,with the rapid development of large-scale models,this paper further discusses the recent research achievements in using large-scale models to assist WSSS tasks.Large-scale models can leverage their pretrained universal knowledge to understand images and generate accu-rate pixel-level pseudo labels,thus improving the final segmentation performance.This paper also reports the quantitative segmentation results on pattern analysis,statistical modeling and computational learning visual object classes 2012(PASCAL VOC 2012)dataset to evaluate the performance of different WSSS methods.Finally,four challenges and poten-tial future research directions are provided.First,a certain performance gap remains between weakly supervised and fully supervised methods.To bridge this gap,research should keep on improving the accuracy of pixel-level pseudo labels.Sec-ond,when WSSS models are applied to real-world scenarios,they may encounter object categories that have never appeared in the training data.This encounter requires the models to have a certain adaptability to identify and segment unknown objects.Third,existing research mainly focuses on improving the accuracy without considering the model size and inference speed of WSSS networks,posing a major challenge for the deployment of the model in real-world applications that require real-time estimations and online decisions.Fourth,the scarcity of relevant datasets used to evaluate different WSSS models and algorithms is also a major obstacle,which leads to performance degradation and limits generalization capability.There-fore,large-scale WSSS datasets with high quality,great diversity,and wide variation of image types must be constructed.

外文关键词：

semantic segmentationdeep learningweakly supervised semantic segmentation(WSSS)image-level anno-tationbounding-box annotationscribble annotationpoint annotationlarge-scale model

作者：

项伟康、周全、崔景程、莫智懿、吴晓富、欧卫华、王井东、刘文予

展开 >

作者单位：

南京邮电大学通信与信息工程学院,南京 210023

梧州学院广西高校智能软件重点实验室,梧州 543003

贵州师范大学大数据与计算机科学学院,贵阳 550025

百度,北京 100085

华中科技大学电子信息与通信学院,武汉 430071

展开 >

关键词：

语义分割深度学习弱监督语义分割(WSSS) 图像级标注最小包围盒标注线标注点标注大模型

基金：

国家自然科学基金国家自然科学基金广西壮族自治区高等学校智能软件重点实验室开放研究项目

项目编号：

61876093622620052023B01

出版年：

2024

DOI：

10.11834/jig.230628

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(5)

参考文献量110