Date : Dec. 17, 2021, 3 p.m. - BARDAKOFF Alexandre - Amphi Garcia
Analysis and Execution of a Data-Flow Graph Explicit Model Using Static Metaprogramming
As parallel computers with multiple CPUs and GPUs become more and more affordable, this manuscript proposes a framework called Hedgehog to achieve efficient parallel computation on a single heterogeneous node. It is the next version of the HTGS framework to bring more extensibility, more expressiveness, and more safety for the design of parallel code. It uses a data-flow graph programming model, with intrinsic parallelism and advanced composability possibilities, to express coarsegrained parallel execution, and relies on data pipelining to overlap data movement and computation in order to fully exploit the available hardware. This model is executed unchanged, without any specific scheduling nor load balancing, contrary to other common task-based approaches.
This specificity enables a developer to understand better how the computation is effectively conducted on any specific hardware, and with the costless visual feedback offered by the library, is able to identify issues in the data-flow graph structure and potentially bottlenecks. An image processing and a linear algebra libraries have been designed based on Hedgehog. Experiments reveal a latency between 1 μs and 10 μs between tasks that allows achieving more than 95% of the theoretical peak performance, with our implementation of matrix multiplication on an heterogeneous node with 4 Nvidia GPUs. With the inherent streaming capabilities of the model, we observe, in the case of chained algorithms, the start of the second algorithm more than 40× faster than executing the algorithms in sequence, with our implementations of LU decomposition and matrix multiplication.
To supply safety during development with Hedgehog, some checking has been integrated in the library using C++ metaprogramming techniques that alerts the developer at compile-time (or while writing code with some IDEs) and prevents from failures during runtime. The first level of checking is based on common template metaprogramming techniques combined with concepts to detect incompatible or potentially unsafe elements while building the graph structure. The second level of checking is based on generalized constant expressions to perform analyses on the overall graph structure at compile-time and alert of potential issues like data races and deadlocks. Those metaprogramming techniques imply no overhead at runtime, and experiments on compilation performance show that only the second level of
checking induces a significant overhead (following the theoretical complexity of the algorithms used) that we consider to worth the benefit of more safety.
KEYWORDS: parallel computing, task graph, data-flow graph, data pipelining, static metaprogramming, generalized constant expressions, template metaprogramming.
Président du jury: M. David Hill
Rapporteurs: M. Joël Falcou, M. Mamadou Kaba Traoré
Examinateurs: Mme Jian Ji Li, M. Shuvra S. Bhattacharyya
Invité: M. Timothy Blattner
Encadrants: Bruno Bachelet, Loïc Yon, Walid Keyrouz