Robotics & Machine Learning Daily News2024,Issue(Jun.4) :22-23.

Study Findings from Yanshan University Provide New Insights into Robotics and Au tomation (Marrgm: Learning Framework for Multiagent Reinforcement Learning Via Reinforcement Recommendation and Group Modification)

燕山大学的研究结果为机器人与自动化提供了新的见解(Marrgm:基于强化推荐和群体修正的多agent强化学习学习框架)

Robotics & Machine Learning Daily News2024,Issue(Jun.4) :22-23.

Study Findings from Yanshan University Provide New Insights into Robotics and Au tomation (Marrgm: Learning Framework for Multiagent Reinforcement Learning Via Reinforcement Recommendation and Group Modification)

燕山大学的研究结果为机器人与自动化提供了新的见解(Marrgm:基于强化推荐和群体修正的多agent强化学习学习框架)

扫码查看

摘要

由一名新闻记者兼机器人与机器学习每日新闻的工作人员新闻编辑-研究人员详细介绍了机器人技术的新数据-机器人S和自动化。根据NewsRx记者在秦皇岛市的新闻报道,研究表明:“样本使用效率是影响多阶段深度强化学习(MADRL)算法收敛速度的重要因素,现有的大多数经验重复(ER)方法手工选择经验样本更新代理策略。”本研究经费来源于国家自然科学基金(NSFC)。我们的新闻编辑从燕山大学的研究中得到一句话:“很难为不同阶段的Agent策略学习提供合适而有效的经验样本,也很难有效地挖掘回放缓冲区中经验样本的潜在价值。”为了提高多agent系统在不同任务场景IO类别下的样本使用效率和寻找最优解的能力,本文提出了一种基于强化重组和组修改的MADRL框架,首先利用推荐网络输出的每个经验样本的抽样概率来推荐抽样,而不是人工抽样;本文利用推荐抽样的经验样本,收集多agent系统在策略更新后的性能,构建推荐网络的强化学习过程。在MADRL算法MAC中嵌入了强化推荐和组修改模块,最后对协同收集、命令移动和目标导航等任务场景进行了实验,并将该算法扩展到MADPG算法中,验证了其可扩展性。

Abstract

By a News Reporter-Staff News Editor at Robotics & Machine Learning Daily News – Researchers detail new data in Robotics - Robotic s and Automation. According to news reporting originating from Qinhuangdao, Peop le’s Republic of China, by NewsRx correspondents, research stated, “Sample usage efficiency is an important factor affecting the convergence speed of multi-agen t deep reinforcement learning (MADRL) algorithms. Most existing experience repla y (ER) methods manually select experience samples to update the agent’s policy.” Financial support for this research came from National Natural Science Foundatio n of China (NSFC). Our news editors obtained a quote from the research from Yanshan University, “It is difficult to give suitable and efficient experience samples for different st ages of agent policy learning as well as to effectively mine the potential value of experience samples in the replay buffer. Inspired by the idea of recommendat ion systems, this paper proposes a MADRL framework based on reinforcement recomm endation and group modification to improve sample use efficiency and the ability to find the optimal solution of the multi-agent system in different task scenar io categories. First, we use the sampling probability of each experience sample output from the recommendation network to recommend sampling instead of manual s ampling; simultaneously, we collect the performance of the multi-agent system af ter updating the policy with the experience sample of recommendation sampling an d construct the reinforcement learning process of the recommendation network. Ne xt, we modify the individual policy of the agent according to the group rewards to improve the agent’s ability to learn the optimal solution. We then combine an d embed the reinforcement recommendation and group modification modules into the MADRL algorithm MAAC. Finally, we experiment with task scenarios, including coo perative collection, command movement, and target navigation, and extend this fr amework to the MADDPG algorithm to verify its scalability.”

Key words

Qinhuangdao/People’s Republic of China/Asia/Robotics and Automation/Robotics/Algorithms/Emerging Technologies/Mac hine Learning/Reinforcement Learning/Yanshan University

引用本文复制引用

出版年

2024
Robotics & Machine Learning Daily News

Robotics & Machine Learning Daily News

ISSN:
段落导航相关论文