基于深度Q网络的海上环境智能路径规划

Intelligent maritime path planning based on deep Q-Networks

李鹏程 ¹周远国 ¹杨国卿²

扫码查看

作者信息

1. 西安科技大学通信与信息工程学院西安 710054
2. 杭州电子科技大学电子信息学院杭州 310018
折叠

摘要

深入研究了融合航海优先级(NP)和优先级经验回放(PER)策略的深度Q网络(DQN)算法在海上环境智能路径规划问题上的应用.不同于传统路径规划算法,本优化算法能够自主探索并学习海上环境的规律,无需依赖人工构建的海洋环境全局信息.本研究开发了基于Gym框架的海上仿真环境,用以模拟和验证改进的DQN模型.该模型融合了航海优先级和优先级经验回放机制,通过调整学习过程中经验样本的利用频率,提升了算法对重要决策的学习效率.此外,引入新的奖赏函数,进一步增强了模型对路径规划问题的适应能力和稳定性.仿真实验结果证明,该模型在避免障碍物及寻找最佳路径方面相较于基准方法有显著提升,展现了一定的泛化性和优秀的稳定性.

Abstract

This study delves into the application of a deep Q-Network (DQN) algorithm,which integrates strategies of Navigational Priority (NP) and Prioritized Experience Replay (PER),for intelligent path planning in maritime environments. Unlike conventional path planning algorithms,our optimized model autonomously explores and learns the patterns of the maritime environment without relying on manually constructed global maritime information. We have developed a maritime simulation environment based on the Gym framework to simulate and validate our improved DQN model. This model incorporates the mechanisms of Navigational Priority and Prioritized Experience Replay,enhancing the algorithm's learning efficiency for critical decisions by adjusting the frequency of experience sample utilization during the learning process. Additionally,the introduction of a novel reward function has further strengthened the model's adaptability and stability in addressing path planning issues. Simulation experiments demonstrate that our model significantly outperforms baseline methods in avoiding obstacles and finding optimal routes,showcasing notable generalizability and exceptional stability.

关键词

改进深度Q网络/海上模拟仿真环境/航海优先级/奖赏函数

Key words

improved deep Q-Network/maritime simulation environment/navigational priority/reward function

引用本文复制引用

基金项目

国家自然科学基金(61801009)

陕西省自然科学基金面上项目(2024JC-YBMS-556)

出版年

2024

电子测量技术

北京无线电技术研究所

电子测量技术

CSTPCD北大核心

影响因子：1.166

ISSN：1002-7300

参考文献量14

段落导航