基于深度强化学习的综合航电系统安全性优化方法

Integrated avionics system safety optimization method based on deep reinforcement learning

赵长啸 ¹李道俊 ²孙亦轩 ²景鹏 ²田毅¹

扫码查看

作者信息

1. 中国民航大学安全工程与科学学院,天津 300300;中国民航大学民航航空器适航审定技术重点实验室,天津 300300
2. 中国民航大学安全工程与科学学院,天津 300300
折叠

摘要

为解决传统基于人工检查的安全性设计方法难以应对航电系统大规模集成带来的可选驻留方案爆炸问题,构建航电系统分区模型、任务模型以及安全关键等级量化模型,将考虑安全性的综合化设计优化问题模型化为马尔可夫决策过程(MDP)问题,并提出一种基于Actor-Critic框架的柔性动作-评价(SAC)算法的优化方法;为得到SAC算法的参数选择和训练结果之间的相关性,针对算法参数灵敏度开展研究;同时,为验证基于SAC算法的优化方法在优化考虑安全性的综合化设计方面的优越性,以深度确定性策略梯度(DDPG)算法和传统分配算法为对象,开展优化对比试验.结果表明:在最佳的参数组合下,使用的SAC算法收敛后的最大奖励相较于其他参数组合提升近8％,同时,收敛时间缩短近16.6％;相较于DDPG算法和传统分配算法,基于SAC算法的优化方法在相同的参数设置下获得的最大奖励、约束累计违背率、分区均衡风险效果、分区资源利用以及求解时间方面最大提升分别为62％、7464％、8370％、2123％和775％.

Abstract

To solve the problem that traditional safety design methods based on manual inspection were difficult to cope with the explosion of optional residence solutions caused by the large-scale integration of avionics systems,an avionics system partition model,task model and safety criticality level quantification model were constructed,and the comprehensive design optimization considering safety was modeled as an MDP problem.An optimization method of Soft Action-Critic(SAC)algorithm based on Actor-Critic framework was proposed.In order to obtain the correlation between the parameter selection and training results of SAC algorithm,the sensitivity of the algorithm parameters was studied.At the same time,to verify the superiority of the optimization method based on the SAC algorithm in optimizing the comprehensive design considering safety,optimization comparison experiments were carried out with the Deep Deterministic Policy Gradient(DDPG)algorithm and the traditional allocation algorithm as the objects.The results show that under the optimal parameter combination,the maximum reward after using convergence of SAC algorithm increases by nearly 8％compared with other parameter combinations,and the convergence time is shortened by nearly 16.6％.Compared with the DDPG algorithm and the traditional allocation algorithm,the optimization method based on SAC algorithm has improved approximately 62％,7464％,8370％,2123％and 775％in terms of the maximum reward,cumulative constraint violation rate,partition balance risk effect,partition resource utilization and solution time.

关键词

深度强化学习/综合航电系统/安全性/优化方法/马尔可夫决策过程(MDP)/综合化设计

Key words

deep reinforcement learning/integrated modular avionics/safety/Markov decision process(MDP)/integrated design

引用本文复制引用

基金项目

国家重点研发计划项目(2021YFB1600601)

天津市高等学校研究生教育改革研究计划项目(TJYG135)

中国民航大学研究生科研创新资助项目(2023YJSKC09015)

出版年

2024

中国安全科学学报

中国职业安全健康协会

中国安全科学学报

CSTPCD北大核心

影响因子：1.548

ISSN：1003-3033

参考文献量7

段落导航