针对在基于深度学习技术的特征提取网络中,深层次的卷积神经网络提取的特征缺乏低级语义信息的问题,该文提出了语义增强的多视立体视觉方法。首先,提出了一种ConvLSTM(Convolutional Long Short-Term Memory)语义聚合网络,通过使用ConvLSTM网络结构,对多个卷积层提取的特征图进行预测,得到融合每层语义信息的特征图,有助于在空间上层层抽取图像的高级特征时,利用长短期记忆神经网络结构的记忆功能来增强高层特征图中的低级语义信息,提高了弱纹理区域的重建效果,提高了3D重建的鲁棒性和完整性;其次,提出了一种可见性网络,在灰度图的基础上,通过突出特征图上可见区域的特征,加深了可见区域在特征图中的影响,有助于提高三维重建效果;最后,提取图像的纹理信息,并进入ConvLSTM语义聚合网络提取深层次特征,提高了弱纹理区域的重建效果。与主流的多视立体视觉重建方法相比,重建效果较好。
We propose a semantic-enhanced multi-view stereo vision method that aims to address the issue of deep convolutional neural networks lacking low-level semantic information in their feature extraction.Firstly,we propose a ConvLSTM(Convolutional Long Short-Term Memory)semantic aggregation network that uses the ConvLSTM network structure to predict the feature map extracted by multiple convolutional layers.This approach results in a feature map that integrates the semantic information of each layer,allowing us to extract high-level features layer by layer in space.The long short-term memory neural network structure's memory function enhances the low-level semantic information in the high-level feature map,leading to improved reconstruction in weak texture regions and greater ro-bustness and integrity in 3D reconstruction.Secondly,we propose a visibility network that highlights the visible area's characteristics on the feature map and deepens the visible area's influence,resulting in better three-dimensional reconstruction.Finally,the texture information of the image is extracted and entered into the ConvLSTM semantic aggregation network to extract the deep-level features,which improves the reconstruction effect of the weak texture area.Compared with mainstream multi-view stereo vision reconstruction methods,the reconstruction effect is better.
3D reconstructiondeep learningmulti-view stereo visionfeature extractionsemantic aggregation network