首页|Libfork: Portable Continuation-Stealing With Stackless Coroutines

Libfork: Portable Continuation-Stealing With Stackless Coroutines

扫码查看
Fully-strict fork-join parallelism is a powerful model for shared-memory programming due to its optimal time-scaling and strong bounds on memory scaling. The latter is rarely achieved due to the difficulty of implementing continuation-stealing in traditional High Performance Computing (HPC) languages – where it is often impossible without modifying the compiler or resorting to non-portable techniques. We demonstrate how stackless-coroutines (a new feature in C++$\bm {20}$) can enable fully-portable continuation stealing and present libfork a wait-free fine-grained parallelism library, combining coroutines with user-space, geometric segmented-stacks. We show our approach is able to achieve optimal time/memory scaling, both theoretically and empirically, across a variety of benchmarks. Compared to openMP (libomp), libfork is on average $7.2\times$ faster and consumes $10\times$ less memory. Similarly, compared to Intel's TBB, libfork is on average $2.7\times$ faster and consumes $6.2\times$ less memory. Additionally, we introduce non-uniform memory access (NUMA) optimizations for schedulers that demonstrate performance matching busy-waiting schedulers.

Parallel processingComputersSwitched mode power suppliesProcessor schedulingMemory managementLibrariesTransistorsSynchronizationMessage systemsConcurrent computing

Conor J. Williams、James Elliott

展开 >

Department of Materials Science and Metallurgy, University of Cambridge, Cambridge, U.K.

2025

IEEE transactions on parallel and distributed systems

IEEE transactions on parallel and distributed systems

SCI
ISSN:
年,卷(期):2025.36(5)
  • 43