Persistent self-learning algorithm for inventory routing problem with periodic demand fluctuation
Inventory routing problem of bike-sharing systems involves periodic demand fluctuations and product vol-ume restrictions.Its optimization requires balancing resource utilization rates and scheduling costs synthetically,and faces significant challenges in guaranteeing solving efficiency and solution quality synchronously.To address such challenge,the corresponding problem was formalized as a multi-objective serialized decision-making Markov process.A time-series-based mixed integer programming model was established,and a global persistent self-learning algo-rithm was proposed.The algorithm consisted of three stages:offline learning,online planning,and persistent learn-ing.In the offline learning phase,a multi-agent cooperative algorithm based on random strategy was designed to ob-tain the spatiotemporal distribution of vehicles and the quantitative description of demand patterns.In the online learning phase,according to the historical order data,the temporal and spatial distribution pattern of each site in each time step was predicted to determine the optimal inventory allocation quantity,and the vehicles were dispatched by the quantitative information obtained in the offline learning stage.The dispatching results in the persistent learn-ing phase were constantly evaluated and improved using the recorded order data within the processing cycle.Experi-ments based on real data showed that the proposed method was superior to the comparison methods in comprehen-sive evaluation indexes such as the complexity of the site demand prediction model,solution quality,the total num-ber of dispatched vehicles,total dispatch distance,and station improvement degree.In addition,through the com-parative analysis of various strategies,the cost variation trend of the problem was summarized,and the algorithm's effectiveness in large-scale examples was verified.
inventory routingperiodic and cyclic fluctuation of demandreinforcement learningonline planningpersistent learning