Real-time Scheduling Method Based on Reinforcement Learning for Material Handling in Assembly Lines
The scheduling of the workshop material handling system is an important part of the production control system of the manufacturing enterprise's flow workshop.Timely and efficient material scheduling can effectively improve production efficiency and economic benefits.In the actual production process,there may be some random events that make the workshop material handling system dynamic.In order to dynamically respond to changes in the state of the assembly line and effectively balance the production efficiency and energy consumption of mixed flow assembly,this paper proposes a reinforcement learning scheduling model based on Q-learning algorithm.The real-time state information of the manufacturing system includes all the state characteristic information of the system at a certain moment.Considering that the complexity of the system is difficult to cover all system states,in order to simplify the model and ensure the accuracy of the decision-making model,and effectively use reinforcement learning to solve it,this paper selects the current real-time information,forward-looking informa-tion of the system and the slack time of each part as the system state characteristics used in the scheduling deci-sion model.It sets up five action groups according to the number of transported parts and the transport sequence of multiple parts.The calculation of the transport scheduling plan for each action group of a multi-carrying trolley is divided into three steps:selecting the transport task,calculating the start time,and coordinating the start time point.The reward and punishment function of the system feedback includes three dimensions:out-of-stock time,handling distance,and part-line inventory,which are given different weights according to the optimization goal,in order to realize the multi-objective optimization of minimizing the travel distance of multi-load trolleys and the line-side inventory of each part while satisfying the on-time delivery of parts on the assembly line as much as possible.In order to solve the problem that the Q table is too large,this paper proposes an improved two-parameter greedy strategy selection method,and introduces the LSTM neural network on the basis of the greedy strategy to fit the Q value,approximating the Q-value function with LSTM neural network,in order to achieve a balanced optimization between speeding up convergence and avoiding premature maturity.This paper uses Arena simulation software to build a simulation system for the mixed-flow assembly line of automobiles,and compare and observe the performance of different scheduling methods under different product ratios.The simulation results show that the optimization effect of modified Q-learning algorithm is better than oth-er scheduling strategies which can effectively reduce the handling distance while ensuring that materials are deliv-ered to the assembly line on time to achieve maximum output.At the same time,the calculation time consumed by the reinforcement learning scheduling method for a scheduling decision is significantly less than other meth-ods,showing good real-time response capability,which meets the real-time requirements of the actual production environment for the scheduling method of the material handling system.
shop floor material handling systemreinforcement learningQ-learninghybrid policy