Using the Tapir Compiler Intermediate Representation to Support Efficient Parallel Languages
This presentation gives an overview of Tapir, which embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream compiler’s IR. Mainstream compilers typically treat parallel linguistic constructs as syntactic sugar for function calls into a parallel runtime. These calls prevent the compiler from performing optimizations on and across parallel control constructs. Remedying this situation has generally been thought to require an extensive reworking of compiler analyses and code transformations to handle parallel semantics. Tapir enables effective compiler optimization of recursive fork-join programs with only minor changes to existing compiler analyses and code transformations.
Tapir leverages the “serial-projection property,” which is commonly satisfied by recursive fork-join programs, to handle the semantics of these programs without an extensive rework of the compiler. Tapir uses the serial-projection property to order logically parallel fine-grained tasks in the program’s control-flow graph. This ordered representation of parallel tasks allows the compiler to optimize parallel codes effectively with only minor modifications. For example, to implement Tapir/LLVM, a prototype of Tapir in the LLVM compiler, we added or modified less than 300030003000 lines of the half-million-line core middle-end functionality in LLVM version~6.
These changes suffice to enable LLVM’s existing compiler optimizations for serial code — including loop-invariant-code motion, common-subexpression elimination, and tail-recursion elimination — to work with parallel control constructs such as parallel loops and Cilk’s cilk_spawn keyword. By making use of existing LLVM optimizations and new parallel optimizations, Tapir/LLVM can optimize recursive fork-join programs more effectively than traditional compilation methods. On a suite of 353535 Cilk application benchmarks, Tapir/LLVM produces more efficient executables for 303030 benchmarks, with faster 181818-core running times for 262626 of them, compared to a nearly identical compiler that compiles parallel linguistic constructs the traditional way. In addition, by integrating Tapir/LLVM into the Accelerated Linear Algebra (XLA) compiler in Google’s TensorFlow machine-learning framework, Tapir/LLVM enables more effective compiler optimization of a variety of neural networks written in TensorFlow, improving the parallel performance of these networks by a geometric-mean multiplicative factor of 30%–-100% across a variety of CPU hardware architectures.
This presentation will overview how Tapir supports compiler optimization of recursive fork-join programs. The talk will examine Tapir and how it enables compiler optimization of parallel programs using existing compiler optimizations for serial code and new parallel optimizations, such as loop scheduling and loop stripmining, that restructure the parallel control flow of the program. The talk will describe how Tapir enables efficient and effective debugging of compiler optimizations for parallel programs. The talk will discuss some theoretical foundations for compiler optimizations based on Tapir.
Monday HILT zoom room – Monday HILT YouTube – HILT Clowdr Break Room