首页|连续环境中基于语义拓扑图的视觉语言导航推理

连续环境中基于语义拓扑图的视觉语言导航推理

扫码查看
针对现有视觉语言导航方法在连续环境中推理能力不足的问题,提出基于语义拓扑图的视觉语言导航推理模型.首先,通过场景理解辅助任务识别导航环境中的区域和物体,构建空间邻近知识库.然后,智能体在导航过程中与环境实时交互,收集位置信息,编码视觉特征,并预测区域和物体的语义标签,逐步生成语义拓扑图.在此基础上,提出辅助推理定位策略,利用自注意力机制,从导航指令中提取物体信息和区域信息,并结合空间邻近知识库和语义拓扑图,对物体和区域进行推理定位,以辅助导航决策,确保智能体的导航轨迹与指令对齐.最后,在公开数据集R2R-CE和RxR-CE上的实验表明,文中模型的导航成功率较高.
Semantic Topological Maps-Based Reasoning for Vision-and-Language Navigation in Continuous Environments
To address the issue of inadequate reasoning ability of existing vision-language navigation methods in continuous environments,a method for semantic topological maps-based reasoning for vision-and-language navigation in continuous environments is proposed.First,regions and objects in the navigation environment are identified through scene understanding auxiliary tasks,and a knowledge base of spatial proximity is constructed.Second,the agent interacts with the environment in real time during the navigation process,collecting location information,encoding visual features and predicting semantic labels of regions and objects.Thereby a semantic topological map is gradually generated.On this basis,an auxiliary reasoning localization strategy is designed.A self-attention mechanism is employed to extract object and region information from navigation instructions,and the spatial proximity knowledge base is combined with semantic topological map to infer and localize objects and regions.The above assists navigation decisions and ensures that the agent navigation trajectory aligns with the instructions.Experimental results on public datasets R2R-CE and RxR-CE demonstrate the proposed method achieves a higher navigation success rate.

Vision-and-Language NavigationVisual ReasoningMulti-modal DataEmbodied Intelli-gence

谢子龙、许明

展开 >

辽宁工程技术大学软件学院 葫芦岛 125105

视觉语言导航 视觉推理 多模态数据 具身智能

辽宁工程技术大学博士科研基金项目

21-1027

2024

模式识别与人工智能
中国自动化学会,国家智能计算机研究开发中心,中国科学院合肥智能机械研究所

模式识别与人工智能

CSTPCD北大核心
影响因子:0.954
ISSN:1003-6059
年,卷(期):2024.37(9)