利用强化学习制定多智能体的移动群智感知策略是学术界普遍采用的技术手段,但是其普遍存在训练过程不加处理地利用所有智能体的状态和动作信息,学习更新效率较低,且对被采集对象重要程度的差异性缺少考虑等问题。本文以指挥员-无人机主从协作场景为研究对象,提出了一种基于匹配博弈和通信机制的主从协作群智感知算法。首先,通过引入Gale-shapely匹配博弈算法思想,建立无人机能量属性与待采集目标数据质量属性之间的最优稳定匹配,实现基于数据重要程度优先的采集策略。为保证无人机对高质量目标的持续特定关注,本文结合了当前较为流行的通信规则型多智能体强化学习算法MAAC(Multi-Actor-Attention-Critic)框架,引入了多注意力机制模块,实现了数据采集过程中主-从智能体间高效的信息交流与共享。实验表明,我们提出的c-MGCM(Crowd-sensing Method based on Matching Game and Communication Mechanism)方法在奖励值、匹配对的距离值等多个评价指标上都优于MADDPG(Multi-agent Deep Deterministic Policy Gradient)、DDPG(deep Deterministic Policy Gradient)等经典算法,在奖励值方面有2~3倍的提升,在数据质量方面有至少14%的提升。该结果表明了c-MGCM方法的高效性和稳定性。
A Master-slave Collaborative Mobile Crowd-sensing Algorithm Based on Matching Game and Communication Mechanism
The use of reinforcement learning to formulate mobile swarm intelligence perception strategies for multiple intelligences is a common technical approach in academia,but it commonly suffers from problems such as the training process using the state and action information of all intelligences without processing,low learning update efficiency,and lack of consideration of the variability in the importance of the collected objects.In this paper,a master-slave collaborative swarm intelligence perception algorithm based on matching game and communication mechanism is proposed for the commander-UAV master-slave collaboration scenario.Firstly,by introducing the idea of Gale-shapely matching game algorithm,an optimal and stable matching between UAV energy attributes and data quality attributes of the target to be collected is established,and an acquisition strategy based on the priority of data impor-tance is realized.In addition,to ensure the UAV's continuous and specific attention to high-quality targets,we combine the MAAC(Multi-Actor-Attention-Critic)framework,which is currently a more popular communication rule-based multi-agent reinforcement learning algorithm,and introduces a multi-attention mechanism module to achieve efficient information exchange and sharing be-tween master-slave intelligences in the data acquisition process.Experiments show that our proposed Crowd-sensing Method based on Matching Game and Communication Mechanism(c-MGCM)method outperforms classical algorithms,such as MADDPG and DDPG,in several evaluation metrics such as reward value and distance value of matched pairs,with a 2-3 times improvement in re-ward value and at least 14%improvement in data quality.The results demonstrate the efficiency and stability of the c-MGCM meth-od.
mobile crowd-sensingdata collectionmatching gamecommunication mechanism