首页|基于深度强化学习的综合航电系统安全性优化方法

基于深度强化学习的综合航电系统安全性优化方法

扫码查看
为解决传统基于人工检查的安全性设计方法难以应对航电系统大规模集成带来的可选驻留方案爆炸问题,构建航电系统分区模型、任务模型以及安全关键等级量化模型,将考虑安全性的综合化设计优化问题模型化为马尔可夫决策过程(MDP)问题,并提出一种基于Actor-Critic框架的柔性动作-评价(SAC)算法的优化方法;为得到SAC算法的参数选择和训练结果之间的相关性,针对算法参数灵敏度开展研究;同时,为验证基于SAC算法的优化方法在优化考虑安全性的综合化设计方面的优越性,以深度确定性策略梯度(DDPG)算法和传统分配算法为对象,开展优化对比试验.结果表明:在最佳的参数组合下,使用的SAC算法收敛后的最大奖励相较于其他参数组合提升近8%,同时,收敛时间缩短近16.6%;相较于DDPG算法和传统分配算法,基于SAC算法的优化方法在相同的参数设置下获得的最大奖励、约束累计违背率、分区均衡风险效果、分区资源利用以及求解时间方面最大提升分别为62%、7464%、8370%、2123%和775%.
Integrated avionics system safety optimization method based on deep reinforcement learning
To solve the problem that traditional safety design methods based on manual inspection were difficult to cope with the explosion of optional residence solutions caused by the large-scale integration of avionics systems,an avionics system partition model,task model and safety criticality level quantification model were constructed,and the comprehensive design optimization considering safety was modeled as an MDP problem.An optimization method of Soft Action-Critic(SAC)algorithm based on Actor-Critic framework was proposed.In order to obtain the correlation between the parameter selection and training results of SAC algorithm,the sensitivity of the algorithm parameters was studied.At the same time,to verify the superiority of the optimization method based on the SAC algorithm in optimizing the comprehensive design considering safety,optimization comparison experiments were carried out with the Deep Deterministic Policy Gradient(DDPG)algorithm and the traditional allocation algorithm as the objects.The results show that under the optimal parameter combination,the maximum reward after using convergence of SAC algorithm increases by nearly 8%compared with other parameter combinations,and the convergence time is shortened by nearly 16.6%.Compared with the DDPG algorithm and the traditional allocation algorithm,the optimization method based on SAC algorithm has improved approximately 62%,7464%,8370%,2123%and 775%in terms of the maximum reward,cumulative constraint violation rate,partition balance risk effect,partition resource utilization and solution time.

deep reinforcement learningintegrated modular avionicssafetyMarkov decision process(MDP)integrated design

赵长啸、李道俊、孙亦轩、景鹏、田毅

展开 >

中国民航大学安全工程与科学学院,天津 300300

中国民航大学民航航空器适航审定技术重点实验室,天津 300300

深度强化学习 综合航电系统 安全性 优化方法 马尔可夫决策过程(MDP) 综合化设计

国家重点研发计划项目天津市高等学校研究生教育改革研究计划项目中国民航大学研究生科研创新资助项目

2021YFB1600601TJYG1352023YJSKC09015

2024

中国安全科学学报
中国职业安全健康协会

中国安全科学学报

CSTPCD北大核心
影响因子:1.548
ISSN:1003-3033
年,卷(期):2024.34(7)
  • 7