Saliency guided object complementary hiding for weakly supervised semantic segmentation
Objective The fully supervised semantic segmentation method based on deep learning has made remarkable progress,promoting practical applications such as automatic driving and medical image analysis.However,the fully super-vised semantic segmentation method depends on the complete pixel-wise annotation,and the construction of large-scale pixel-wise annotation datasets requires a considerable amount of human labor and resources.Researchers have recently attempted to study semantic segmentation based on convenient supervisions,such as bounding boxes,scribbles,points,and image-level labels,to reduce the reliance on accurate annotations.Weakly supervised semantic segmentation based on image-level labels only uses category labels to train the segmentation network,which can significantly reduce the annota-tion cost.Most of the existing weakly supervised semantic segmentation methods use class activation map(CAM)to locate target objects.On the one hand,the CAM generated by classification networks is sparse and can only focus on the most dis-criminative areas of objects.Some misactivated pixels are observed in the CAM,which may provide improper guidance for the subsequent segmentation task.On the other hand,the performance of the segmentation network depends on the quality of the pseudo labels.Thus,obtaining the accurate pseudo label also requires the shape and boundary of the object.How-ever,this information cannot be directly and accurately obtained in image-level labels,and guaranteeing the quality of pseudo labels is difficult.A new saliency-guided weakly supervised semantic segmentation algorithm is proposed in this paper to improve the performance of the segmentation model to obtain complete CAMs.Method First,research shows that randomly hiding the target in the image can enhance the capability of the network to locate the complete target.However,part of the image information cannot be used when directly hiding the image at random.By contrast,the complementary hiding method can use all the image information.However,guaranteeing that the target object can be hidden as expected is difficult due to the randomness of the hiding method.Only the background area is randomly hidden in some cases.A saliency-guided object complementary hiding method is proposed in this paper.Through the foreground information pro-vided by the saliency map,the complementary random hiding of the object in the image is performed to obtain the comple-mentary image pairs.The CAMs of the complementary image pairs are then fused as supervision to improve the capability of the network to obtain complete CAMs.Second,the convolution operation in the classification network used to generate CAMs can lead to a local receptive field,which may cause some differences in the corresponding features of the same class objects with changes in scale,illumination,and viewing angle.These differences may result in intra-class inconsistency,negatively affecting the activation and leading to mis-activation in the CAM.In addition,the classification network itself has weak capability to extract complete objects,and achieving good effects in expanding the object area using the object complementary hiding method guided only by saliency is still difficult.Therefore,a dual attention refinement module is introduced to further correct the CAM by the global information,and the obtained CAM is used to generate the pseudo label to train the segmentation network.Prediction results of the segmentation network will have higher accuracy than the original pseudo labels.However,the prediction results also have some noise,which cannot guarantee the performance improve-ment of the segmentation model by directly using iterative training.Finally,this paper uses the label iteration refinement strategy,combines the initial prediction of the segmentation network,CAM,and saliency map to generate pseudo labels,and iteratively trains the segmentation network to further improve the performance of the segmentation network.Saliency maps can effectively distinguish between foreground and background objects but cannot identify the object categories.CAMs can accurately locate the object categories but lack information regarding the complete shape of the objects.Segmen-tation network prediction can provide relatively complete information regarding the object boundary but may contain mis-classification pixels.The impact of pixel misclassification is markedly reduced by fully utilizing the information provided by the three types of maps to refine the pseudo labels.Result The experiment is divided into two parts to verify the effective-ness of the algorithm.In the first part,the proposed CAM generation algorithm in this paper is verified and compared with other methods.In the second part,the proposed method is compared with several classical weakly supervised semantic seg-mentation algorithms,and the effectiveness of the modules in the proposed model is analyzed by ablation experiment.The experiments are initially conducted on the PASCAL VOC 2012 dataset.By contrast,the CAM generated by this algorithm is more complete,and its mean intersection over union(mIoU)is improved by 10.21%compared with the baseline.The segmentation network produced better prediction results compared with the six methods,demonstrating a 6.9%improve-ment over the baseline.Thus,the proposed method outperforms the other methods in 13 categories.With an mIoU value of 92%in the background category,the proposed method achieved the highest performance among other methods,indicating its effective utilization of saliency maps in training.Multi-objective semantic segmentation experiment is also conducted on the COCO 2014 data set.Compared with PASCAL VOC 2012,this dataset has richer categories and contains a larger num-ber of images with multiple object categories,indicating a high demand on the performance of the algorithm.Experimental results show that the value of mIoU is improved by 0.5%on COCO 2014.Conclusion This algorithm can obtain a complete CAM,effectively alleviate the problem of insufficient supervision information,and improve the accuracy of weakly super-vised semantic segmentation models.
deep learningweakly supervised semantic segmentationsaliency guidanceclass activation map(CAM)attention mechanism