Celerity — High-Level Distributed Accelerator C++ Programming


Date
Feb 15, 2024 11:00 AM — 12:00 PM
Location
Obergurgl, Austria

While domain-specific HPC software packages continue to thrive and are vital to many scientific communities, a general purpose high-productivity GPU cluster programming model that facilitates experimentation for non-experts remains elusive.

Celerity, a combined API and task-based runtime system for programming distributed-memory GPU-based HPC hardware platforms, seeks to provide the means to scale C++ applications to distributed-memory accelerator clusters with relative ease by leveraging the SYCL domain-specific embedded language. By providing information about the logical and spatial buffer access behavior of kernels, users enable the Celerity runtime system to automatically split work across multiple GPUs. Encoded in an execution graph, correctness of the distributed program is ensured by tracking kernel data dependencies and issuing data transfers when required. This flexible design facilitates the effective utilization of hardware resources without the need for manual scheduling.

To further increase productivity, Celerity provides an easy-to-use API intended to offer frequently encountered higher-order parallel programming patterns such as reductions, stencils and parallel I/O. A fully functional implementation is developed and maintained at the University of Innsbruck, currently used for porting benchmarks and applications. This talk will present the Celerity concept, current results and ongoing research.