面向未知动态环境的机器人搜救任务避障算法

An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments

陈洋 ¹史殿习 ²杨焕焕 ³李彤月 ⁴王震⁴

扫码查看

作者信息

1. 北京大学计算机学院,中国北京市,100871
2. 天津(滨海)人工智能创新中心,中国天津市,300457;智能博弈与决策实验室,中国北京市,100071
3. 国防科技大学计算机学院,中国长沙市,410073
4. 智能博弈与决策实验室,中国北京市,100071
折叠

摘要

本文研究未知动态环境下具有多个兴趣目标的移动机器人搜救任务问题.由于移动机器人需要搜救多个目标并避开障碍,此类问题具有挑战性.为确保移动机器人合理避碰,本文提出一种基于混合策略纳什均衡的Dyna-Q算法(MNDQ).首先,引入一种多目标分层结构以简化问题,该结构将整个任务划分为多个子任务,包括搜索目标和躲避障碍.其次,提出基于动态风险相对位置的风险监测机制,使机器人避免潜在碰撞和绕路.此外,为提高采样效率,提出了结合Dyna-Q和混合策略纳什均衡的强化学习方法(MNDQ).根据混合策略纳什均衡,智能体以概率的形式做出决策从而最大化期望回报,提高Dyna-Q算法的整体性能.最后,通过仿真实验验证所提方法的有效性.结果表明,该方法具有良好的表现并为未来的机器人自主导航任务提供了解决思路.

Abstract

This paper deals with the search-and-rescue tasks of a mobile robot with multiple interesting targets in an unknown dynamic environment.The problem is challenging because the mobile robot needs to search for multiple targets while avoiding obstacles simultaneously.To ensure that the mobile robot avoids obstacles properly,we propose a mixed-strategy Nash equilibrium based Dyna-Q(MNDQ)algorithm.First,a multi-objective layered structure is introduced to simplify the representation of multiple objectives and reduce computational complexity.This structure divides the overall task into subtasks,including searching for targets and avoiding obstacles.Second,a risk-monitoring mechanism is proposed based on the relative positions of dynamic risks.This mechanism helps the robot avoid potential collisions and unnecessary detours.Then,to improve sampling efficiency,MNDQ is presented,which combines Dyna-Q and mixed-strategy Nash equilibrium.By using mixed-strategy Nash equilibrium,the agent makes decisions in the form of probabilities,maximizing the expected rewards and improving the overall performance of the Dyna-Q algorithm.Furthermore,a series of simulations are conducted to verify the effectiveness of the proposed method.The results show that MNDQ performs well and exhibits robustness,providing a competitive solution for future autonomous robot navigation tasks.

关键词

搜索救援/强化学习/博弈论/避碰/决策问题

Key words

Search and rescue/Reinforcement learning/Game theory/Collision avoidance/Decision-making

引用本文复制引用

基金项目

国家自然科学基金(91948303)

出版年

2024

信息与电子工程前沿(英文)

浙江大学

信息与电子工程前沿(英文)

CSTPCD

影响因子：0.371

ISSN：2095-9184

参考文献量62

段落导航