基于改进强化学习的准时化物料搬运系统实时调度方法

扫码查看

原文链接

万方数据
维普

中文摘要：准时高效的物料搬运系统保证了装配制造的持续稳定运行,为动态应对装配线状态变化,有效平衡混流装配的生产效率与能耗,本文提出了基于Q学习算法的强化学习调度模型,对其系统状态、动作策略、报酬函数进行设计,并引入神经网络对Q值函数进行泛化和逼近,改进策略选择机制,形成基于双参数贪婪策略的强化学习动态调度方法.仿真实验结果表明,这种强化学习调度相比其他调度方法,物料搬运调度的优化效果更好,能在保证物料准时运送到装配线,实现最大产量的同时,有效减少搬运距离.

外文标题：Real-time Scheduling Method Based on Reinforcement Learning for Material Handling in Assembly Lines

外文摘要：The scheduling of the workshop material handling system is an important part of the production control system of the manufacturing enterprise's flow workshop.Timely and efficient material scheduling can effectively improve production efficiency and economic benefits.In the actual production process,there may be some random events that make the workshop material handling system dynamic.In order to dynamically respond to changes in the state of the assembly line and effectively balance the production efficiency and energy consumption of mixed flow assembly,this paper proposes a reinforcement learning scheduling model based on Q-learning algorithm.The real-time state information of the manufacturing system includes all the state characteristic information of the system at a certain moment.Considering that the complexity of the system is difficult to cover all system states,in order to simplify the model and ensure the accuracy of the decision-making model,and effectively use reinforcement learning to solve it,this paper selects the current real-time information,forward-looking informa-tion of the system and the slack time of each part as the system state characteristics used in the scheduling deci-sion model.It sets up five action groups according to the number of transported parts and the transport sequence of multiple parts.The calculation of the transport scheduling plan for each action group of a multi-carrying trolley is divided into three steps:selecting the transport task,calculating the start time,and coordinating the start time point.The reward and punishment function of the system feedback includes three dimensions:out-of-stock time,handling distance,and part-line inventory,which are given different weights according to the optimization goal,in order to realize the multi-objective optimization of minimizing the travel distance of multi-load trolleys and the line-side inventory of each part while satisfying the on-time delivery of parts on the assembly line as much as possible.In order to solve the problem that the Q table is too large,this paper proposes an improved two-parameter greedy strategy selection method,and introduces the LSTM neural network on the basis of the greedy strategy to fit the Q value,approximating the Q-value function with LSTM neural network,in order to achieve a balanced optimization between speeding up convergence and avoiding premature maturity.This paper uses Arena simulation software to build a simulation system for the mixed-flow assembly line of automobiles,and compare and observe the performance of different scheduling methods under different product ratios.The simulation results show that the optimization effect of modified Q-learning algorithm is better than oth-er scheduling strategies which can effectively reduce the handling distance while ensuring that materials are deliv-ered to the assembly line on time to achieve maximum output.At the same time,the calculation time consumed by the reinforcement learning scheduling method for a scheduling decision is significantly less than other meth-ods,showing good real-time response capability,which meets the real-time requirements of the actual production environment for the scheduling method of the material handling system.

外文关键词：

shop floor material handling systemreinforcement learningQ-learninghybrid policy

作者：

夏蓓鑫、顾嘉怡、田童、袁杰、彭运芳

展开 >

作者单位：

上海大学管理学院,上海 200444

关键词：

车间物料搬运系统强化学习 Q学习混合策略

基金：

国家自然科学基金资助项目上海市浦江人才计划项目

项目编号：

7180114722PJC051

出版年：

2024

DOI：

10.12005/orms.2024.0183

运筹与管理

中国运筹学会

运筹与管理

CSTPCDCHSSCD北大核心

影响因子：0.688

ISSN：1007-3221

年,卷(期)：2024.33(6)

参考文献量4