Libfork: Portable Continuation-Stealing With Stackless Coroutines

扫码查看

原文链接

NETL
NSTL
IEEE

外文摘要：Fully-strict fork-join parallelism is a powerful model for shared-memory programming due to its optimal time-scaling and strong bounds on memory scaling. The latter is rarely achieved due to the difficulty of implementing continuation-stealing in traditional High Performance Computing (HPC) languages – where it is often impossible without modifying the compiler or resorting to non-portable techniques. We demonstrate how stackless-coroutines (a new feature in C++$\bm {20}$) can enable fully-portable continuation stealing and present libfork a wait-free fine-grained parallelism library, combining coroutines with user-space, geometric segmented-stacks. We show our approach is able to achieve optimal time/memory scaling, both theoretically and empirically, across a variety of benchmarks. Compared to openMP (libomp), libfork is on average $7.2\times$ faster and consumes $10\times$ less memory. Similarly, compared to Intel's TBB, libfork is on average $2.7\times$ faster and consumes $6.2\times$ less memory. Additionally, we introduce non-uniform memory access (NUMA) optimizations for schedulers that demonstrate performance matching busy-waiting schedulers.

外文关键词：

Parallel processingComputersSwitched mode power suppliesProcessor schedulingMemory managementLibrariesTransistorsSynchronizationMessage systemsConcurrent computing

作者：

Conor J. Williams、James Elliott

展开 >

作者单位：

Department of Materials Science and Metallurgy, University of Cambridge, Cambridge, U.K.

出版年：

2025

DOI：

10.1109/TPDS.2025.3543442

IEEE transactions on parallel and distributed systems

SCI

ISSN：

年,卷(期)：2025.36(5)

参考文献量43