首页|决策树码率自适应算法的无数据蒸馏框架

决策树码率自适应算法的无数据蒸馏框架

扫码查看
码率自适应(Adaptive Bit-Rate,ABR)算法是流媒体视频传输中至关重要的技术.该算法根据当前网络情况和播放状态等因素,为下一个视频块选择合适的码率,以确保用户获得良好的体验质量(QoE).其中,基于学习的ABR算法因其不依赖传统模型和从头学习策略的特点,表现出良好的性能,并逐渐取代需要繁琐调优的启发式ABR算法,成为研究领域的热点.然而,这些算法使用神经网络推理,导致模型参数较多,整体计算量较大,使得在实际场景中难以部署.因此,以往的研究提出了决策树蒸馏方案,即使用轻量级的决策树来提取基于学习的ABR算法的专家策略,并在线上部署这些决策树.然而,本文的实验结果表明,过去的蒸馏框架忽略了训练环境对蒸馏后策略的影响,导致策略的泛化能力较差.因此,本文提出了一种名为NIA(data-free Network-environmental Imitation-based rate Adaptation framework)的新型无数据蒸馏框架,用于生成具有更好泛化性能的决策树ABR算法.NIA通过网络环境生成模块构建多个人工网络环境,并在每次迭代训练前使用环境选择模块来选择适合的网络场景,然后与该场景进行交互,利用基于学生驱动的模仿学习算法完成决策树的蒸馏过程.本文还设计了完整的评测平台测试NIA的性能.实验表明,NIA在各种带宽数据集上展现出良好的QoE性能和泛化性能:(1)相较于启发式算法,在QoE指标上提升了 1%~46%;(2)与以往的决策树蒸馏方案相比,在低带宽场景下表现相当,但在高带宽场景下提升了近1倍;(3)总体性能接近甚至超过基于学习的算法(即专家策略)的表现.
A Data-Free Distillation Framework for Adaptive Bitrate Algorithms
Adaptive Bit-Rate(ABR)algorithm is a key technique in streaming video transmission.The algorithm selects an appropriate bit-rate for the next video chunk based on the current network conditionsand playback status to ensure a high-quality user experience(QoE).Among them,learning-based ABR algorithms,due to their characteristic of bypassing traditional modeling and learning strategies from scratch,have achieved better performance and gradually replaced heuristic ABR algorithms that require careful tuning,becoming a research hotspot in the field.However,these algorithms use neural network inference,which results in a large number of model parameters and high overall computational overhead,making it difficult to deploy them in real-world scenarios.Therefore,previous works proposed decision tree distillation schemes,which utilize lightweight decision trees to distill expert policies from learning-based ABR algorithms and deploy them online.However,the experiments in the paper show that the previous distillation framework overlooked the influence of training environments on the distilled policies,resulting in poor generalization capability.The paper proposes NIA(data-free Network-environmental Imitation-based rate Adaptation framework),a novel data-free distillation framework for generating decision tree ABR algorithms with better generalization.NIA generates and selects suitable artificial network environments for distilling decision tree policies.Specifically,NIA uses a network environment generation module to construct multiple artificial network environments and,before each iteration of training,leverages an environment selection module to choose appropriate network scenarios.Prior to each training iteration,the module chooses suitable network environments for distilling teacher policies using a student model.This process is modeled as a"no-regret online learning"problem in reinforcement learning.In detail,the module utilizes the Upper Confidence Bound(UCB)algorithm with the Top-K approach to select network environments from the pool generated by the network environment generation module based on the current performance indicators of the student model and the environment.The selection process ensures performance improvement while maintaining lower bounds.NIA then interacts with the selected scenario to complete the decision tree distillation process based on a student-driven imitation learning algorithm.Based on the observed current state,the student model selects an appropriate bit-rate,while the state is input to the teacher model to obtain expert policies.Subsequently,the module distills the policies by reducing the distance between the student model's strategy and the expert strategy.The paper designs a comprehensive evaluation platform to test the performance of NIA.The data demonstrates that NIA exhibits good QoE performance and generalization performance on various bandwidth datasets:(1)Compared to heuristic algorithms,it improves QoE metrics by 1%to 46%;(2)Compared to previous decision tree distillation schemes,it performs equally well in low-band-width scenarios but achieves nearly twice the improvement in high-bandwidth scenarios;(3)Its overall performance is close to or even surpasses that of learning-based algorithms(i.e.,expert policies).Finally,the paper carefully analyzes and compares the important parameter settings within the three modules of NIA,including the impact of student model and network environment parameters on NIA's performance,the number of environment generations,the Top-K ratio,the selection algorithm,and the exploration index in the algorithm.Through extensive experiments,each parameter of NIA is appropriately set within the correct range to achieve better results.

video streamingadaptive bitrate algorithmdata-free distillation

黄天驰、李朝阳、张睿霄、李文哲、孙立峰

展开 >

清华大学计算机科学与技术系 北京 100084

流媒体 码率自适应算法 无数据蒸馏

国家自然科学基金

61936011

2024

计算机学报
中国计算机学会 中国科学院计算技术研究所

计算机学报

CSTPCD北大核心
影响因子:3.18
ISSN:0254-4164
年,卷(期):2024.47(1)
  • 50