HTDcr:a job execution framework for high-throughput computing on supercomputers

Jiazhi JIANG ¹Dan HUANG ¹Hu CHEN ²Yutong LU ¹Xiangke LIAO¹

扫码查看

作者信息

1. School of Computer Science and Engineering,Sun Yat-sen University,Guangzhou 510006,China
2. School of Software Engineering,South China University of Technology,Guangzhou 510006,China
折叠

Abstract

High-throughput computing(HTC)is a computing paradigm that aims to accomplish jobs by easily breaking them into smaller,independent components.However,it requires a large amount of computing power for a long time.Most existing HTC frameworks are job-oriented without support for coscheduling with hardware architecture and task-level execution.Also,most of the frameworks reach a limited scale,and their usability needs further improvement.Herein,we present HTDcr,a job execution framework for the HTC on supercomputers.This study aims to improve the throughput,task dispatching,and usability of the framework.In detail,the throughput optimizations include a sophisticated designed task management system,a hierarchical scheduler,and the co-optimization of the task-scheduling strategy with the application and hardware characteristics.The optimizations for usability include a programable execution workflow,mechanisms for more robust and reliable service qualities,and a fine-grained resource allocation system for the colocation of multiple jobs.According to our evaluations,HTDcr can achieve outstanding scalability and high throughput on large-scale clusters for the HTC workload.We evaluate HTDcr with several microbenchmarks and real-world applications on Tianhe-2 and Sunway TaihuLight to demonstrate its effects on existing design mechanisms.For instance,the task scheduling for two real-world applications integrated with the application and hardware characteristics achieves 1.7× and 1.9× speedups over the basic task-scheduling strategy.

Key words

high-throughput computing/supercomputer/task scheduling/middleware/password guessing

引用本文复制引用

基金项目

National Key R&D Program of China(2021YFB0301300)

National Natural Science Foundation of China(U1811461)

Zhejiang Lab(2021KC0AB04)

Major Program of Guangdong Basic and Applied Research(2019B030302002)

Program for Guangdong Introducing Innovative and Entrepreneurial Teams(2016ZT06D211)

出版年

2024

中国科学:信息科学(英文版)

中国科学院

中国科学:信息科学(英文版)

CSTPCDEI

影响因子：0.715

ISSN：1674-733X

参考文献量1

段落导航