采用注意力机制和奖励塑造的深度强化学习视觉目标导航方法

Deep Reinforcement Learning Visual Target Navigation Method Based on Attention Mechanism and Reward Shaping

孟怡悦 ¹郭迟 ²刘经南¹

扫码查看

作者信息

1. 武汉大学卫星导航定位技术研究中心,湖北武汉,430079
2. 湖北珞珈实验室,湖北武汉,430079
折叠

摘要

视觉目标导航作为视觉导航的重要任务之一,要求智能体在给定导航目标的前提下,仅仅依靠视觉图像信息探索环境并导航到目标跟前,并提出任务结束动作.现有视觉目标导航方法基于端到端的深度强化学习框架来解决视觉目标导航问题,仍存在导航成功率和效率不高的不足.为了进一步提升视觉目标导航方法下智能体的导航性能,提出了一种基于注意力机制和奖励塑造的深度强化学习视觉目标导航方法.针对强化学习中状态构建不佳和奖励稀疏问题,利用缩放点积注意力机制引入当前时间步和上一时间步的状态之间的关系,用于构建更佳的当前时间步的状态,利用奖励塑造自动化设置奖励空间,解决奖励稀疏问题.在AI2-THOR数据集上进行实验,并使用成功率和路径长度加权成功率评估方法性能.实验结果显示,相较于以往的方法,所提出的方法在成功率上提高了 7％,在路径长度加权成功率上提高了 20％.该方法使用注意力机制和奖励塑造构建了更好的状态和奖励空间,能够进一步提升智能体的导航成功率和效率.

Abstract

Objectives:As one of the important tasks of visual navigation,visual target navigation requires the agent to explore and navigate to the target and issue the done action only relying on visual image infor-mation and target information.Presently,the existing methods usually adopt deep reinforcement learning framework to solve visual target navigation problems.However,there are still some shortcomings:(1)The existing methods ignore the relationship between the state of the current and previous time step,resulting in poor navigation performance.(2)The reward settings of the existing methods are fixed and sparse.The agents cannot obtain better navigation strategies under sparse reward.To solve these problems,we propose a deep reinforcement learning visual target navigation method based on attention mechanism and reward shaping.This method can further improve the performance of visual target navigation tasks.Methods:First,the method obtains the area of path focused by the agent at the previous time step based on scaled dot production attention between previous visual image and action.Then,the method obtains the area of path focused by the agent at current time step based on scaled dot production attention between current visual image and previous focused area of path to introduce the state relationship.Besides,to obtain the current focused area of target,we also utilize scaled dot production attention mechanism.We concatenate the current focused area of path and target to build a better state of the agent.Additionally,we propose a reward reshaping rule to solve the problem of sparse reward and apply the cosine similarity between the visual image and target to automatically build a reward space with target preference.Finally,the attention method and reward reshap-ing method are combined together to form the deep reinforcement learning visual target navigation method based on attention mechanism and reward shaping.Results:We conduct experiments on AI2-THOR dataset and use success rate(SR)and success weighted by path length(SPL)to evaluate the performance of visual target navigation methods.The results indicate that our method shows 7％improvement in SR and 20％in SPL,which means that the agent can learn a better navigation strategy.In addition,the ablation study shows that the introduction of state relationship and reward shaping can both improve the navigation perfor-mance.Conclusions:To draw a conclusion,the proposed deep reinforcement learning visual target naviga-tion method based on attention mechanism and reward shaping can further improve the navigation success rate and efficiency by building better states and reward space.

关键词

视觉导航/视觉目标导航/深度强化学习/注意力机制/奖励塑造

Key words

visual navigation/visual target navigation/deep reinforcement learning/attention mecha-nism/reward shaping

引用本文复制引用

基金项目

湖北省重大科技专项(2022AAA009)

湖北珞珈实验室开放基金()

出版年

2024

武汉大学学报(信息科学版)

武汉大学

武汉大学学报(信息科学版)

CSTPCD北大核心

影响因子：1.072

ISSN：1671-8860

参考文献量4

段落导航