If-conversion is a fundamental technique for vectorization. It accounts for the fact that in a SIMD program, several targets of a branch might be executed because of divergence. Especially for irregular data-parallel workloads, it is crucial to avoid if-converting non-divergent branches to increase SIMD utilization. In this paper, we present partial linearization, a simple and efficient if-conversion algorithm that overcomes several limitations of existing if-conversion techniques. In contrast to prior work, it has provable guarantees on which non-divergent branches are retained and will never duplicate code or insert additional branches. We show how our algorithm can be used in a classic loop vectorizer as well as to implement data-parallel languages such as ISPC or OpenCL. Furthermore, we implement prior vectorizer optimizations on top of partial linearization in a more general way. We evaluate the implementation of our algorithm in LLVM on a range of irregular data analytics kernels, a neutronics simulation benchmark and NAB, a molecular dynamics benchmark from SPEC2017 on AVX2, AVX512, and ARM Advanced SIMD machines and report speedups of up to 146% over ICC, GCC and Clang O3.
Fri 22 Jun
|11:00 - 11:25|
Aravind AcharyaIndian Institute of Science, Bangalore, Uday BondhugulaIndian Institute of Science, Albert CohenInria, France / ENS, FranceMedia Attached
|11:25 - 11:50|
|11:50 - 12:15|
Dong ChenUniversity of Rochester, Fangzhou LiuUniversity of Rochester, Chen DingUniversity of Rochester, Sreepathi PaiUniversity of RochesterMedia Attached