首页|Libfork: Portable Continuation-Stealing With Stackless Coroutines
Libfork: Portable Continuation-Stealing With Stackless Coroutines
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NETL
NSTL
IEEE
Fully-strict fork-join parallelism is a powerful model for shared-memory programming due to its optimal time-scaling and strong bounds on memory scaling. The latter is rarely achieved due to the difficulty of implementing continuation-stealing in traditional High Performance Computing (HPC) languages – where it is often impossible without modifying the compiler or resorting to non-portable techniques. We demonstrate how stackless-coroutines (a new feature in C++$\bm {20}$) can enable fully-portable continuation stealing and present libfork a wait-free fine-grained parallelism library, combining coroutines with user-space, geometric segmented-stacks. We show our approach is able to achieve optimal time/memory scaling, both theoretically and empirically, across a variety of benchmarks. Compared to openMP (libomp), libfork is on average $7.2\times$ faster and consumes $10\times$ less memory. Similarly, compared to Intel's TBB, libfork is on average $2.7\times$ faster and consumes $6.2\times$ less memory. Additionally, we introduce non-uniform memory access (NUMA) optimizations for schedulers that demonstrate performance matching busy-waiting schedulers.
Parallel processingComputersSwitched mode power suppliesProcessor schedulingMemory managementLibrariesTransistorsSynchronizationMessage systemsConcurrent computing
Conor J. Williams、James Elliott
展开 >
Department of Materials Science and Metallurgy, University of Cambridge, Cambridge, U.K.