深度强化学习算法求解动态流水车间实时调度问题

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：本文针对动态流水车间调度问题(DFSP),以最小化最大完工时间为优化目标,提出一种自适应深度强化学习算法(ADRLA)进行求解.首先,将DFSP的新工件动态到达过程模拟为泊松过程,进而采用马尔科夫决策过程(MDP)对DFSP的求解过程进行描述,将DFSP转化为可由强化学习求解的序贯决策问题.然后,根据DFSP的排序模型特点,设计具有较好状态特征区分度和泛化性的状态特征向量,并依此提出5种特定动作(即调度规则)来选择当前需加工的工件,同时构造基于问题特性的奖励函数以获取动作执行效果的评价值(即奖励值),从而确定ADRLA的3类基本要素.进而,以深度双Q网络(DDQN)作为ADRLA中的智能体,用于进行调度决策.该智能体采用由少量小规模DFSP确定的数据集(即3类基本要素在不同问题上的数据)训练后,可较准确刻画不同规模DFSP的状态特征向量与Q值向量(由各动作的Q值组成)间的非线性关系,从而能对各种规模DFSP进行自适应实时调度.最后,通过在不同测试问题上的仿真实验和与算法比较,验证了所提ADRLA求解DFSP的有效性和实时性.

外文标题：Deep reinforcement learning algorithm for dynamic flow shop real-time scheduling problem

外文摘要：This paper aims at the dynamic flow shop scheduling problem(DFSP),an adaptive deep reinforcement learning algorithm(ADRLA)is proposed to minimize the maximum completion time of DFSP.Firstly,the solving process of DFSP is described by the Markov decision process(MDP),so as to transform the DFSP into a sequential decision problem that can be solved by reinforcement learning.Then,according to the characteristics of DFSP scheduling model,the state representation vector with good state feature discrimination and generalization is designed,and five specific actions are proposed(i.e.reward value).Furthermore,the deep double Q network(DDQN)is used as the agent in ADRLA to make scheduling decisions.After training with the data set determined by a small number of small-scale DFSPs(i.e.the data of three basic elements on different problems),the agent can accurately describe the nonlinear relationship between the state representation vector and the Q-value vector(composed of the Q-value of each action)of different scale DFSPs,so as to carry out adaptive real-time scheduling for various scale DFSPs.Finally,simulation experiments on different test problems and comparison with the algorithm verify the effectiveness and real-time performance of the proposed ADRLA in solving DFSP.

外文关键词：

flow shop schedulingarrival of new jobsdeep reinforcement learningdynamic real-time schedulingintelligent scheduling

作者：

杨媛媛、胡蓉、钱斌、张长胜、金怀平

展开 >

作者单位：

昆明理工大学信息工程与自动化学院,云南昆明 650500

昆明理工大学云南省人工智能重点实验室,云南昆明 650500

关键词：

流水车间调度新工件到达深度强化学习动态实时调度智能调度

基金：

国家自然科学基金项目国家自然科学基金项目云南省基础研究重点项目

项目编号：

6217316961963022202201AS070030

出版年：

2024

DOI：

10.7641/CTA.2023.20916

控制理论与应用

华南理工大学中国科学院数学与系统科学研究院

控制理论与应用

CSTPCD北大核心

影响因子：1.076

ISSN：1000-8152

年,卷(期)：2024.41(6)

参考文献量4