一种基于HDFS的分布式文件系统MPIFS

扫码查看

原文链接

NETL
NSTL
万方数据
维普

中文摘要：传统的MPI(Message Passing Interface)计算特点是数据向计算迁移,对于数据量庞大的计算任务具有先天的不足.文中提出一种支持MPI的分布式文件系统MPIFS的架构及实现.该文件系统基于HDFS(Hadoop Dis-tributed File System),使得MPI在MPIFS上能同时支持计算密集型和数据密集型计算,设置两个类型的批处理词频统计实验,所需数据都分布式存储在MPIFS分布式文件系统中,通过调用系统提供的统一数据接口实现数据访问.1个计算节点在本地计算大小为m的文件,n个计算节点分布式并行计算大小为n ×m的文件,两者计算时间相同,MPIFS中文件总量不变,计算节点数量减少,计算时间t变长,可得出MPIFS文件系统架构可行,能够支持MPI实现计算向数据迁移的并行计算.

外文标题：A distributed file system MPIFS based on HDFS

外文摘要：Traditional MPI(Message Passing Interface)computing is characterized by the migration of data to computing,which has inherent shortcomings for computing tasks with alarge amount of data.This paper proposes a distributed file system architecture supporting MPI,the architecture and implementation of MPIFS.The file system is based on HDFS(Hadoop Distributed File System),enabling MPI to support both computation-intensive and data-intensive computations on MPIFS.In this paper,two types of batch word frequency statistics experiments are set up.All the data required for the experiments are distributed and stored in the MPIFS distributed file system,and the data access is achieved by calling the unified data interface provided by the system.Through experiments,a single computing node locally calculates m size files in the same time as n nodes parallelly calculates m X n size files,and the total amount of files in MPIFS remains unchanged.The number of computing nodes decreases and the computing time t becomes longer.It can be concluded that the MPIFS file system architecture is feasible and can support MPI to realize the parallel computation of computation to data migration.

外文关键词：

MPIdistributed file systemsdistributed parallel computationcomputation migration

作者：

陈卓航、陈雅琴、郭志勇

展开 >

作者单位：

西南民族大学计算机科学与工程学院,成都 610225

关键词：

MPI 分布式文件系统分布式并行计算计算迁移

基金：

中央高校基本科研业务费专项资金项目

项目编号：

2023NYXXS036

出版年：

2024

DOI：

10.19352/j.cnki.issn1671-4679.2024.01.002

黑龙江工程学院学报

黑龙江工程学院

黑龙江工程学院学报

影响因子：0.414

ISSN：1671-4679

年,卷(期)：2024.38(1)

参考文献量7