计算机应用与软件2024,Vol.41Issue(3) :16-21,48.DOI:10.3969/j.issn.1000-386x.2024.03.003

面向两级多中心架构的深度学习平台设计与实现

DESIGN AND IMPLEMENTATION OF DEEP LEARNING PLATFORM FOR TWO LEVEL MULTI CENTER ARCHITECTURE

程仲汉
计算机应用与软件2024,Vol.41Issue(3) :16-21,48.DOI:10.3969/j.issn.1000-386x.2024.03.003

面向两级多中心架构的深度学习平台设计与实现

DESIGN AND IMPLEMENTATION OF DEEP LEARNING PLATFORM FOR TWO LEVEL MULTI CENTER ARCHITECTURE

程仲汉1
扫码查看

作者信息

  • 1. 福建警察学院计算机与信息安全管理系 福建福州 350007
  • 折叠

摘要

大型企业的深度学习工作存在管理散乱和大量重复建设的问题.为了支持大规模深度学习的全过程管理和模型成果的高效复用,以国家电网公司的两级多中心部署架构为背景,提出一种深度学习平台.系统将训练、推理、数据和模型的管理工作分布在不同中心完成,彼此间协同完成深度学习的闭环.构建基于Kubernetes的私有云来支撑大批量深度学习应用的并行计算.前端界面采用基于算子的流程编排实现建模可视化和功能的可扩展.实验结果表明系统能够支持多个深度学习任务的并行,且额外的性能开销是可以接受的.

Abstract

There are some problems in the deep learning work of large enterprises,such as scattered management and a large number of redundant projects.In order to support the whole process management of large-scale deep learning and efficient reuse of model results,a deep learning platform is proposed based on the two level multi center deployment architecture of State Grid Corporation of China.The system distributed the management work of training,inferencing,data and models into different centers,and they cooperated to complete the closed-loop of deep learning.A private cloud based on Kubernetes was used to support the parallel computing of large number of deep learning applications.The front-end interface adopted operator-based flow arrangement to realize modeling visualization and function expansion.The experimental results show that the system can support the parallel execution of multiple deep learning tasks,and the additional performance overhead is acceptable.

关键词

深度学习平台/两级部署/多中心/Kubernetes容器云/流程编排

Key words

Deep learning platform/Two level deployment/Multi center/Kubernetes container cloud/Flow arrangement

引用本文复制引用

基金项目

福建省中青年教师教育科研项目(JAT200379)

出版年

2024
计算机应用与软件
上海市计算技术研究所 上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心
影响因子:0.615
ISSN:1000-386X
参考文献量20
段落导航相关论文