首页|Study Findings from Yanshan University Provide New Insights into Robotics and Au tomation (Marrgm: Learning Framework for Multiagent Reinforcement Learning Via Reinforcement Recommendation and Group Modification)
Study Findings from Yanshan University Provide New Insights into Robotics and Au tomation (Marrgm: Learning Framework for Multiagent Reinforcement Learning Via Reinforcement Recommendation and Group Modification)
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NETL
NSTL
By a News Reporter-Staff News Editor at Robotics & Machine Learning Daily News – Researchers detail new data in Robotics - Robotic s and Automation. According to news reporting originating from Qinhuangdao, Peop le’s Republic of China, by NewsRx correspondents, research stated, “Sample usage efficiency is an important factor affecting the convergence speed of multi-agen t deep reinforcement learning (MADRL) algorithms. Most existing experience repla y (ER) methods manually select experience samples to update the agent’s policy.” Financial support for this research came from National Natural Science Foundatio n of China (NSFC). Our news editors obtained a quote from the research from Yanshan University, “It is difficult to give suitable and efficient experience samples for different st ages of agent policy learning as well as to effectively mine the potential value of experience samples in the replay buffer. Inspired by the idea of recommendat ion systems, this paper proposes a MADRL framework based on reinforcement recomm endation and group modification to improve sample use efficiency and the ability to find the optimal solution of the multi-agent system in different task scenar io categories. First, we use the sampling probability of each experience sample output from the recommendation network to recommend sampling instead of manual s ampling; simultaneously, we collect the performance of the multi-agent system af ter updating the policy with the experience sample of recommendation sampling an d construct the reinforcement learning process of the recommendation network. Ne xt, we modify the individual policy of the agent according to the group rewards to improve the agent’s ability to learn the optimal solution. We then combine an d embed the reinforcement recommendation and group modification modules into the MADRL algorithm MAAC. Finally, we experiment with task scenarios, including coo perative collection, command movement, and target navigation, and extend this fr amework to the MADDPG algorithm to verify its scalability.”
QinhuangdaoPeople’s Republic of ChinaAsiaRobotics and AutomationRoboticsAlgorithmsEmerging TechnologiesMac hine LearningReinforcement LearningYanshan University