Research progress and challenges in real-time semantic segmentation for deep learning
Semantic segmentation is widely used as an important research direction in the field of computer vision,and its purpose is to classify the input image at the pixel level according to predefined categories.Real-time semantic segmenta-tion,as a subfield of semantic segmentation,adds speed requirements to segmentation methods on the basis of general semantic segmentation and is widely used in fields,such as unmanned driving,medical image analysis,video surveil-lance,and aerial images.The segmentation method should achieve not only high segmentation accuracy but also fast seg-mentation speed(specifically,the speed of processing images per unit time reaches 30 frames).With the rapid develop-ment of deep learning technology and neural networks,real-time semantic segmentation has also achieved certain research results.Majority of previous researchers have discussed semantic segmentation,but review papers on real-time semantic segmentation methods are few.In this paper,we systematically summarize the real-time semantic segmentation algorithms based on deep learning on the basis of the existing work of the previous researchers.We first introduce the concept of real-time semantic segmentation,and then,according to the number and quality of the participating training labels,the existing real-time semantic segmentation methods based on deep learning are categorized into three classes:strongly supervised learning,weakly supervised learning,and unsupervised learning.Strongly supervised learning methods are categorized from three perspectives:improving accuracy,improving speed,and other methods.Accuracy improvement methods are further divided into subcategories according to the network structure and feature fusion methods.According to the network structure,the real-time semantic segmentation methods can be categorized into encoder-decoder structure,two-branch structure,and multibranch structure;the representative networks in the encoder-decoder section are fully convolutional network(FCN)and UNet;the networks with two-branch structure are the BiSeNet series;and the multibranch structure has ICNet and DFANet.According to the different ways of feature fusion,real-time semantic segmentation methods can be categorized into multiscale feature fusion and attention mechanism.According to the different ways of feature sampling in the process of multiscale feature fusion,this study divides multiscale feature fusion into atrous spatial pyramid pooling and ordinary pyramid pooling;the attention mechanism can be further divided into self-attention mechanism,channel atten-tion,and spatial attention according to the computation method of the attention vector.The methods to improve the speed are analyzed and discussed from the perspectives of improving convolutional blocks and lightweight networks;the methods to improve convolutional blocks can be divided into separable convolution(separable convolution can be divided into depth separable convolution and spatial separable convolution),grouped convolution,and atrous convolution.Among other meth-ods of strongly supervised learning,we also specifically add methods of knowledge distillation,Transformer-based meth-ods,and pruning,which are less mentioned in other literatures.Given the numerous methods for real-time semantic seg-mentation based on strongly supervised learning,we also perform a comparative analysis of the strengths and weaknesses of all the mentioned methods.Real-time semantic segmentation based on weakly supervised learning is classified into meth-ods based on image-level labeling,methods based on point labeling,methods based on object box labeling,and methods based on object underlining labeling.The concept of unsupervised learning is introduced,and the commonly used unsuper-vised semantic segmentation methods at the present stage are described,including the method with the introduction of the generalized domain adaptation problem and the method with the introduction of unsupervised pre-adaptation task.Subse-quently,the datasets and evaluation indexes commonly used in real-time semantic segmentation are introduced.In addition to the street scene dataset commonly used in unmanned counting,this study supplements the medical image dataset.In the evaluation indexes,this study provides a detailed introduction to the accuracy measure and speed measure and then com-pares the experimental effects of the algorithms on the datasets so far through the table to obtain the latest research progress in the field.The application scenarios of real-time semantic segmentation are further elaborated in detail.Real-time seman-tic segmentation can be applied to automatic driving,which can segment road scene images in a short time to help identify roads,traffic signs,pedestrians,vehicles,and other objects.By segmenting medical images at the pixel level,real-time semantic segmentation can also help doctors identify and localize lesion areas accurately.In the field of natural disaster monitoring and emergency rescue,real-time semantic segmentation can quickly identify airplanes and aircrafts and can help doctors identify and locate lesion areas accurately.Real-time semantic segmentation can quickly recognize disaster areas in aerial images;real-time segmentation of scenes and objects in surveillance videos can provide accurate and intelli-gent data for surveillance systems.Then,according to the specific application scenarios of real-time semantic segmentation and the problems encountered at this stage,this study considers that the challenges faced by real-time semantic segmenta-tion include the following:1)mobile segmentation problem,which hardly develops large-scale computation on low-storage devices;2)how to get away from the dependence of efficient networks on hardware devices;3)experimental accuracy of the current real-time semantic segmentation model,which hardly reaches the standard of automatic driving;4)lack of scene data for medical image and 3D point cloud design.Finally,this study gives an outlook on the future directions of real-time semantic segmentation that are worth researching,e.g.,occlusion segmentation,real-time semantic segmentation of small targets,adaptive learning model,cross-modal joint learning,data-centered real-time semantic segmentation,and small-sample real-time semantic segmentation.