基于深度强化学习的不确定作业车间调度方法

Deep Reinforcement Learning Model for Job Shop Scheduling Problems with Uncertainty

吴新泉 ¹燕雪峰 ²魏明强 ²关东海²

扫码查看

作者信息

1. 南京航空航天大学计算机科学与技术学院,南京 211106
2. 南京航空航天大学计算机科学与技术学院,南京 211106;软件新技术与产业化协同创新中心,南京 210093
折叠

摘要

作业车间调度是具有非确定性多项式(Non-deterministic polynomial,NP)难的经典组合优化问题.在作业车间调度中,通常假设调度环境信息已知且在调度过程中保持不变,然而实际调度过程往往受到诸多不确定因素影响(如机器故障、工序变化).本文提出基于混合优先经验重放的近端策略优化(Proximal policy optimization with hybrid prioritized experience replay,HPER-PPO)调度算法,用于求解不确定条件下的作业车间调度问题.将作业车间调度问题建模为马尔科夫决策过程,设计作业车间的状态特征、回报函数、动作空间和调度策略网络.为了提高深度强化学习模型的收敛性,提出一种新的混合优先经验重放模型训练方法.在标准数据集和基于标准数据集生成的数据集上评估了提出的调度方法,结果表明:在静态调度试验中,本文提出的调度模型比现有的深度强化学习方法和优先调度规则取得了更精确的结果.在动态调度试验中,针对作业车间的工序不确定性,本文所提出的调度模型可以在合理的时间内获得更精确的调度结果.

Abstract

Job shop scheduling problem(JSSP)is a non-deterministic polynomial(NP)-hard classical combinatorial optimization problem.In JSSP,it is usually assumed that the scheduling environment information is known and remains unchanged during the scheduling process.However,the actual scheduling process is often affected by many uncertain factors(such as machine failures and process changes).A proximal policy optimization with hybrid prioritized experience replay(HPER-PPO)scheduling algorithm is proposed for solving JSSPs with uncertainties.The JSSP is modeled as a Markov decision process where the state features,reward function,action space,and scheduling policy networks are designed.In order to improve the convergence of the proposed deep reinforcement learning model,a new hybrid prioritized experiential replay training method is proposed.The proposed scheduling method is evaluated on standard datasets and datasets generated based on standard datasets.The results show that in static scheduling experiments,the proposed scheduling model achieves more accurate results than existing deep reinforcement learning methods and priority dispatching rules.In dynamic scheduling experiments,the proposed scheduling model can achieve more accurate scheduling results in a reasonable time for JSSP with process order uncertainty.

关键词

作业车间调度/深度强化学习/近端策略优化/优先经验重放

Key words

job shop scheduling problem/deep reinforcement learning/proximal policy optimization/prioritized experience replay

引用本文复制引用

出版年

2024

数据采集与处理

中国电子学会中国仪器仪表学会信号处理学会　中国仪器仪表学会中国物理学会微弱信号检测学会　南京航空航天大学

数据采集与处理

CSTPCD北大核心

影响因子：0.679

ISSN：1004-9037

段落导航