COURIER:Edge Computing Task Scheduling and Offloading Method Based on Non-preemptive Priorities Queuing and Prioritized Experience Replay DRL
Edge computing(EC)deploy a large number of computing and storage resources at the edge of the network to meet re-quirements on latency and power consumption of tasks.Computing offloading is one of the key technologies in EC.When estima-ting the delay of task queuing,the existing computation offloading methods usually use M/M/1/∞/∞/FCFS or M/M/n/∞/∞/FCFS models.Without considering the priority of high delay sensitive tasks,these methods cause some computation tasks that do not require sensitive delay always occupy the computation resources,increasing the delay cost of these methods.Meanwhile,most of the existing playback methods use random sampling to replay experience,which cannot distinguish the pros and cons of expe-rience,resulting in low experience utilization and slow neural network convergence.At last,the deterministic policy deep rein-forcement learning(DRL)based on computational offloading methods have problems,such as weak ability of exploring environ-ment,low robustness and low experience utilization rate,which reduces the accuracy of solving computational unload problem.To solve the above problems,considering the multi-task mobile device and multi-edge server computing offload scenarios,aims to minimize the system delay and energy consumption,study task scheduling and offloading decision-making problems,and computa-tion offloading qUeuing and pRioritIzed experience replay DRL(COURIER)is proposed.COURIER first designs a non-preemp-tive priority queuing model(M/M/n/∞/∞/NPR)to optimize the queuing delay of tasks.Then,it proposes a maximum entropy deep reinforcement learning algorithm based on prioritized experience replay.For the offloading decision problem,an offloading decision mechanism of priority experience replay SAC is proposed,based on soft actor-critic(SAC)algorithm.In this mechanism,information entropy is added to the objective function to make the agent adopt random strategy,and the empirical sampling me-thod is optimized to accelerate the convergence rate of the network.Simulation results show that COURIER can effectively reduce system delay and energy consumption.