基于Q学习的安全服务功能链编排算法

Q-learning-based Algorithm for Orchestrating Security Service Function Chain

刘行 ¹郭靓 ¹王正琦 ¹韦小刚 ¹徐雪菲 ¹刘京²

扫码查看

作者信息

1. 南瑞集团有限公司(国网电力科学研究院有限公司),江苏南京 210000;南京南瑞信息通信科技有限公司,江苏南京 210000
2. 国网山东省电力公司电力科学研究院,山东济南 250003
折叠

摘要

随着科技的发展,互联网已成为人类生活中不可或缺的一部分,而网络安全也显得尤为重要.为了保障网络安全,动态安全服务功能链编排是其中一个重要的研究方向.但是,现在对于动态安全服务功能链的网络资源映射和编排算法的研究主要集中在某一种网络资源,研究方向多以优化某个网络资源和降低网络服务延迟为主要目标,忽略了网络整体资源分配的均衡性.本文构建物理网络模型和安全服务功能链模型,在满足用户需求的情况下,同时考虑物理网络节点计算资源和链路带宽资源,目标是取得最好的网络资源均衡分配.根据强化Q学习算法,提出新的链路编排奖励方式,引入贪婪策略避免陷入局部最优,选取一个典型物理网络模型和不同个数的安全服务功能链,多次迭代得到安全服务功能链的最优编排路径.仿真结果表明,提出的安全服务功能链的最优编排与模拟退火算法相比在编排响应时间上减少了38.5%,在资源分配均衡度上提升了2.1%;与遗传算法相比在编排响应时间上减少了96.5%,在资源分配均衡度上提升了2.9%.

Abstract

With the development of technology,Internet is becoming an indispensable part of human life and network security is becoming particularly important.To ensure network security,the orchestration of dynamic security service function chains is an important research direction.However,current research on network resource mapping and orchestration algorithms for dynamic security service function chains mainly focuses on a specific type of network resource,with the main goal of optimizing a certain network resource and reducing network service latency.They overlook the balance of overall resource allocation in the network.We construct a physical network model and a security service function chain model.Considering both physical network node com-puting resources and link bandwidth resources while meeting user needs,the goal is to achieve the best-balanced allocation of network resources.Based on the reinforcement Q-learning algorithm,a new link arrangement reward method is proposed,and a greedy strategy is introduced to avoid falling into local optima.A typical physical network model and different numbers of security service function chains that needs to be arranged are selected and the optimal arrangement path of the security service function chain is obtained through multiple iterations.The simulation results show that the optimal arrangement of the proposed security service function chain reduces the arrangement response time by 38.5%and improves the resource allocation balance by 2.1%compared to the simulated annealing algorithm.Compared with a genetic algorithm,it reduces the response time of orchestration by 96.5%and improves the balance of resource allocation by 2.9%.

关键词

网络安全/安全服务功能链/Q学习/贪婪策略/资源分配

Key words

network security/security service function chain/Q-learning/greedy strategy/resource allocation

引用本文复制引用

出版年

2024

计算机与现代化

江西省计算机学会江西省计算技术研究所

计算机与现代化

CSTPCD

影响因子：0.472

ISSN：1006-2475

段落导航