Fri 22 Jun 2018 16:10 - 16:35 at Grand Ballroom AB - Parallelism Chair(s): Julian Dolby

In this paper, we develop an approach to GPU kernel optimization by focusing on identification of bottleneck resources and determining optimization parameters that can alleviate the bottleneck. Performance modeling for GPUs is done by abstract kernel emulation along with latency/gap modeling of resources. Sensitivity analysis with respect to resource latency/gap parameters is used to predict the bottleneck resource for a given kernel's execution. The utility of the bottleneck analysis is demonstrated in two contexts: 1) Coupling the new bottleneck-driven optimization strategy with the OpenTuner auto-tuner: experimental results on all kernels from the Rodinia suite and GPU tensor contraction kernels from the NWChem computational chemistry suite demonstrate effectiveness. 2) Manual code optimization: two case studies illustrate the use of the bottleneck analysis to iteratively improve the performance of code from state-of-the-art domain-specific code generators.

Fri 22 Jun

Displayed time zone: Eastern Time (US & Canada) change

16:10 - 17:25
ParallelismPLDI Research Papers at Grand Ballroom AB
Chair(s): Julian Dolby IBM Thomas J. Watson Research Center
16:10
25m
Talk
GPU Code Optimization using Abstract Kernel Emulation and Sensitivity Analysis
PLDI Research Papers
Changwan Hong , Aravind Sukumaran-Rajam Ohio State University, USA, Jinsung Kim Ohio State University, USA, Prashant Singh Rawat , Sriram Krishnamoorthy Pacific Northwest National Laboratories, Louis-Noël Pouchet Colorado State University, Fabrice Rastello INRIA, P. Sadayappan Ohio State University
Media Attached
16:35
25m
Talk
Gluon: A Communication-Optimizing Substrate for Distributed Heterogeneous Graph Analytics
PLDI Research Papers
Roshan Dathathri University of Texas at Austin, USA, Gurbinder Gill University of Texas at Austin, USA, Loc Hoang University of Texas at Austin, USA, Hoang-Vu Dang University of Illinois at Urbana-Champaign, USA, Alex Brooks University of Illinois at Urbana-Champaign, USA, Nikoli Dryden University of Illinois at Urbana-Champaign, USA, Marc Snir UIUC, Keshav Pingali University of Texas at Austin, USA
Media Attached
17:00
25m
Talk
Heartbeat Scheduling: Provable Efficiency for Nested Parallelism
PLDI Research Papers
Umut A. Acar Carnegie Mellon University, Arthur Charguéraud Inria, Adrien Guatto , Mike Rainey , Filip Sieczkowski University of Wrocław
Media Attached