Semantic Topological Maps-Based Reasoning for Vision-and-Language Navigation in Continuous Environments
To address the issue of inadequate reasoning ability of existing vision-language navigation methods in continuous environments,a method for semantic topological maps-based reasoning for vision-and-language navigation in continuous environments is proposed.First,regions and objects in the navigation environment are identified through scene understanding auxiliary tasks,and a knowledge base of spatial proximity is constructed.Second,the agent interacts with the environment in real time during the navigation process,collecting location information,encoding visual features and predicting semantic labels of regions and objects.Thereby a semantic topological map is gradually generated.On this basis,an auxiliary reasoning localization strategy is designed.A self-attention mechanism is employed to extract object and region information from navigation instructions,and the spatial proximity knowledge base is combined with semantic topological map to infer and localize objects and regions.The above assists navigation decisions and ensures that the agent navigation trajectory aligns with the instructions.Experimental results on public datasets R2R-CE and RxR-CE demonstrate the proposed method achieves a higher navigation success rate.