RGB-D Image Semantic Segmentation using Multi-scale Encoder Features Fusion
Aiming at the problem of large size changes of target objects in indoor scenes in semantic segmentation tasks,an RGB-D semantic segmentation network that integrates multi-scale features of encoders is proposed based on ACFNet..To maximize the utilization of the network's multi-scale features,a multi-scale feature fusion module,incor-porating a pooling operation(PMFM),is proposed.This module takes fusion features derived from both RGB and depth features at different encoder stages as input.And design a multiple skip connection module(MSCM),use the feature map of the next level containing more scene semantic information to assist in correcting the feature map of the current level.Then transmit it to the corresponding stage of the decoder through skip connection in a concatenated way.The network model presented in this research paper is evaluated on two widely used public datasets:NYUD V2 and SUN RGB-D.Specifically,the mean Intersection-over-Union achieved on the NYUD V2 dataset is 52.6%,while on the SUN RGB-D dataset,it reaches 48.8% .With these two improvements,the experimentation has demonstrated that the method in this paper achieves a high segmentation accuracy,which is superior to ACFNet and the other com-pared semantic segmentation method.