首页|波动需求库存路径问题的持续自学习求解算法

波动需求库存路径问题的持续自学习求解算法

扫码查看
共享单车库存路径问题是一种受商品总量约束且需求周期性波动的库存路径问题,其优化过程需综合考虑资源利用率和调度成本,在求解大规模算例时难以同时保证求解效率和质量.针对上述挑战,将问题形式化为多目标序列化决策的马尔可夫过程,建立了时间序列混合整数规划模型并提出了一种全局持续自学习算法.算法由离线学习、在线规划和持续学习三阶段构成.离线学习阶段设计了基于随机策略的多智能体协同算法以获取配送载具时空分布和需求点需求模式的定量化描述;在线规划阶段根据历史订单数据,对各时间步中的需求模式进行预测以确定最优的库存分配数量,并利用离线学习阶段的定量信息对供应商配送载具进行调度;持续学习阶段于每个处理周期结束后使用记录的订单数据对周期内调度结果进行持续评估和改进.基于企业真实数据的实验表明,在需求预测模型复杂程度、求解质量、调度载具总数量、总调度距离和站点改善程度等的综合评价指标上,所提算法优于对比方法.此外,通过对多种策略进行对比分析,总结出了库存问题的成本变化规律,并验证了算法在大规模算例下的有效性.
Persistent self-learning algorithm for inventory routing problem with periodic demand fluctuation
Inventory routing problem of bike-sharing systems involves periodic demand fluctuations and product vol-ume restrictions.Its optimization requires balancing resource utilization rates and scheduling costs synthetically,and faces significant challenges in guaranteeing solving efficiency and solution quality synchronously.To address such challenge,the corresponding problem was formalized as a multi-objective serialized decision-making Markov process.A time-series-based mixed integer programming model was established,and a global persistent self-learning algo-rithm was proposed.The algorithm consisted of three stages:offline learning,online planning,and persistent learn-ing.In the offline learning phase,a multi-agent cooperative algorithm based on random strategy was designed to ob-tain the spatiotemporal distribution of vehicles and the quantitative description of demand patterns.In the online learning phase,according to the historical order data,the temporal and spatial distribution pattern of each site in each time step was predicted to determine the optimal inventory allocation quantity,and the vehicles were dispatched by the quantitative information obtained in the offline learning stage.The dispatching results in the persistent learn-ing phase were constantly evaluated and improved using the recorded order data within the processing cycle.Experi-ments based on real data showed that the proposed method was superior to the comparison methods in comprehen-sive evaluation indexes such as the complexity of the site demand prediction model,solution quality,the total num-ber of dispatched vehicles,total dispatch distance,and station improvement degree.In addition,through the com-parative analysis of various strategies,the cost variation trend of the problem was summarized,and the algorithm's effectiveness in large-scale examples was verified.

inventory routingperiodic and cyclic fluctuation of demandreinforcement learningonline planningpersistent learning

郭羽含、李津宁、沈学利

展开 >

浙江科技学院理学院,浙江 杭州 310023

辽宁工程技术大学软件学院,辽宁 葫芦岛 125105

库存路径 产品需求周期波动 强化学习 在线规划 持续学习

国家自然科学基金辽宁省自然科学基金辽宁省教育厅基础研究项目

614040692019-ZD-0048LJ2019JL012

2024

计算机集成制造系统
中国兵器工业集团第210研究所

计算机集成制造系统

CSTPCD北大核心
影响因子:1.092
ISSN:1006-5911
年,卷(期):2024.30(4)
  • 3