首页|使用连续动作的近端策略优化算法求解有限产能批量问题

使用连续动作的近端策略优化算法求解有限产能批量问题

扫码查看
研究了有限产能批量问题,以多产品单机系统为研究对象,以最小化生产总成本(生产成本、库存成本、机器设置成本、缺货积压成本)为优化目标.通过将问题转化为马尔可夫决策过程,利用基于近端策略优化的深度强化学习算法进行求解.由于使用离散动作空间的深度强化学习难以扩展到大型问题,为此本文采用在策略网络中添加映射函数的方法将连续动作表示的深度强化学习应用于求解此问题.实验表明,文中所设计的算法所需的训练时间更少,在实验结果上与直接用CPLEX求解的最优解接近,在求解速度上也更有优势.
Using the Proximal Strategy Optimization Algorithm With Continuous Action to Solve the Capacitated Lot-sizing Problem
In this paper,the capacitated lot-sizing problem is studied,taking the multi-product stand-alone system as the research object,and aiming at minimizing the total production cost,including production cost,inventory cost,machine setup cost and out-of-stock backlog cost.By transforming the problem into a Markov decision process,a deep reinforcement learning algorithm based on proximal policy optimization is used to solve it.Since the deep reinforcement learning using discrete action space is difficult to expand to large problems,this paper uses the method of adding mapping function in the policy network to apply deep reinforcement learning represented by continuous actions to solve this problem.Experiments show that the algorithm designed in this paper requires less training time,and the experimental results are close to the optimal solution solved directly by CPLEX,and the solution speed is more advantageous.

capacitated lot-sizing problemdeep reinforcement learningmarkov decision process continuous action spaceproximal policy optimization

章天吉、林文文、张岳君、项薇、战韬阳

展开 >

宁波大学 机械工程与力学学院,浙江宁波 315211

奥克斯集团有限公司,浙江宁波 315191

上海交通大学 机械与动力工程学院,上海 200240

浙江工商职业技术学院 机电工程学院,浙江宁波 315699

宁波大学 阳明学院,浙江宁波 315211

展开 >

有限产能批量问题 深度强化学习 马尔可夫决策过程 连续动作空间 近端策略优化

2024

机械设计与研究
上海交通大学

机械设计与研究

CSTPCD北大核心
影响因子:0.531
ISSN:1006-2343
年,卷(期):2024.40(1)
  • 15