首页|异构群智感知PPO多目标任务指派方法

异构群智感知PPO多目标任务指派方法

扫码查看
现有移动群智感知系统的任务指派主要面向单一类型移动用户展开,对于存在多种类型移动用户的异构群智感知任务指派研究相对缺乏。为此,本文针对异质移动用户,定义其区域可达性,并给出感知子区域类型划分。进而,兼顾感知任务数量和移动用户规模的时变性,构建了动态异构群智感知系统任务指派的多目标约束优化模型。模型以最大化感知质量和最小化感知成本为目标,综合考虑用户的最大任务执行数量、无人机的受限工作时间等约束。为解决该优化问题,本文提出一种基于近端策略优化的多目标进化优化算法。采用近端策略优化,根据种群的当前进化状态,选取具有最高奖励值的进化算子,生成子代种群。面向不同异构群智感知实例,与多种算法的对比实验结果表明,所提算法获得的Pareto最优解集具有最佳的收敛性和分布性,进化算子选择策略可以有效提升对时变因素的适应能力,改善算法性能。
PPO multi-objective task allocation method for heterogeneous crowd sensing
The task allocation of existing mobile crowd sensing systems is mainly carried out for a single type of mobile users,but there is a lack of research on the task allocation of heterogeneous crowd sensing where there are multiple types of mobile users.Therefore,we define the area accessibility of heterogeneous mobile users,and give a classification of sensing sub-regions.Then,we construct a multi-objective constrained optimization model for task allocation of dynamic heterogeneous crowd sensing systems,taking into account the time-varying nature of the number of sensing tasks and the size of mobile users.The model aims to maximize the sensing quality and minimize the sensing cost,taking into account the maximum number of tasks to be performed by users and the restricted working time of UAVs.To solve this optimization problem,a multi-objective evolutionary optimization algorithm based on proximal policy optimization is proposed.The proximal policy optimization is used to select the evolutionary operator with the highest reward value according to the current evolutionary state of the population,and generate the offspring population.The experimental results of comparing the proposed algorithm with various algorithms for different heterogeneous crowd sensing instances show that the optimal solution set of Pareto obtained by the proposed algorithm has the best convergence and distributivity,and the evolutionary operator selection strategy can effectively improve the adaptability to time-varying factors and improve the performance of the algorithm.

heterogeneous crowd sensingmulti-objective optimizationreinforcement learningproximal policy opti-mization

杨潇、郭一楠、吉建娇、刘旭

展开 >

中国矿业大学信息与控制工程学院,江苏徐州 221116

中国矿业大学(北京)机械与电气工程学院,北京 100083

中国矿业大学人工智能研究院,江苏 徐州 221008

异构群智感知 多目标优化 强化学习 近端策略优化

国家自然科学基金项目国家自然科学基金项目国家自然科学基金项目国家重点研发计划项目

61973305U23A20340521210032022YFB4703700

2024

控制理论与应用
华南理工大学 中国科学院数学与系统科学研究院

控制理论与应用

CSTPCD北大核心
影响因子:1.076
ISSN:1000-8152
年,卷(期):2024.41(6)