Exploiting parallelism is a key challenge in programming modern systems across a wide range of application domains and platforms. From the world's largest supercomputers, to embedded DSPs, OpenMP provides a programming model for parallel programming that a compiler can understand and optimize. While LLVM's optimizer has not traditionally been involved in OpenMP's implementation, with all of the outlining logic and translation into runtime-library calls residing in Clang, several groups have been experimenting with implementation techniques that push some of this translation process into LLVM itself. This allows the optimizer to simplify these parallel constructs before they're transformed into runtime calls and outlined functions.
We've experimented with several techniques for implementing a parallel IR in LLVM, including adding intrinsics to represent OpenMP constructs (as proposed by Intel and others) and using Tapir (an experimental extension to LLVM originally developed at MIT), and have used these to lower both parallel loops and tasks. Nearly all parallel IR techniques allow for analysis information to flow into the parallel code from the surrounding serial code, thus enabling further optimization, and on top of that, we've implemented optimizations such as fusion of parallel regions and the removal of redundant barriers. In this talk, we'll report on these results and other aspects of our experiences working with parallel extensions to LLVM's IR.