Predictive resource allocation:unsupervised learning of Markov decision processes
When future information of a mobile user such as trajectory is known,predictive resource allocation for video on-demand service can reduce energy consumption of base station or increase network throughput with ensured user experience.Traditional methods for predictive resource allocation first predict user information(say trajectory)and then optimize resource(say power)allocation.However,the prediction accuracy degrades as the prediction horizon increases.To deal with this issue,several recent works employed deep reinforcement learning for online decision-making by formulating the predictive resource allocation problem as Markov decision process(MDP).However,for this kind of MDP problems that is appropriately solved by reinforcement learning,existing works design the state in a trial-and-error manner.For constrained optimization problems,most existing reinforcement learning methods for wireless problems add penalty terms to the reward function with manually adjustable hyper-parameters to satisfy the constraints.This paper proposes an unsupervised deep learning method for online predictive resource allocation in an end-to-end manner,which can jointly predict information and optimize resource allocation.The proposed method is able to improve the performance of predictive resource allocation by online end-to-end unsupervised deep learning,and can systematically design the state of MDP and satisfy complex constraints such that the tedious trial-and-error methods for designing state and satisfying constraints are no longer necessary.We analyze the relationship between the unsupervised deep learning and deep reinforcement learning.Simulation results show that the proposed method needs almost the same energy consumption as deep reinforcement learning with a simplified state design process,which verifies the theoretical analysis.