Aiming at the problems of low accuracy and robustness of traditional simultaneous localization and mapping (SLAM) pose estimation in dynamic scenes, a dynamic visual SLAM algorithm based on adaptive optimization of feature point weights is proposed.Firstly, the Mask region convolutional neural network (Mask R-CNN) is used to semantically segment the input image and obtain the mask of the dynamic feature points.On this basis, the static feature points are matched between frames to obtain the initial pose transformation value.Then, the motion consistency detection algorithm and multi-view geometry algorithm are used to process the image and obtain the corresponding dynamic feature point masks respectively.Then, the weight function of the feature points is constructed according to the obtained information of the three dynamic feature points.The influence of the feature points on pose optimization is adjusted adaptively by minimizing the reprojection error, and the influence of the dynamic targets in the scene on the accuracy of SLAM is reduced.Finally, using the dynamic data set of Technical University of Munich for simulation test, the root mean square error (RMSE) of the absolute trajectory error (ATE) is only 3.1% of the scale invariant feature transform simultaneous localization and mapping (ORB-SLAM2) in the indoor high-dynamic scene.Compared with the dynamic SLAM system such as DS-SLAM and DynaSLAM, the absolute trajectory error is 52% of the DS-SLAM and 86.1% of the DynaSLAM.The results show that the proposed algorithm can significantly improve the localization accuracy and robustness of SLAM system in high dynamic environment.
visual SLAMdynamic scenesemantic segmentationmotion consistency detectionmulti-view geometryfeature point weight