| <!--#include virtual="../../header.incl" --> |
| |
| <div class="www_sectiontitle" id="top">2019 European LLVM Developers Meeting</div> |
| <div style="float:left; width:68%;"> |
| <div style="width:100%;"> |
| <ul> |
| <li><a href="index.html">Conference main page</a></li> |
| <li><b>Conference Dates</b>: April 8-9, 2019</li> |
| <li><b>Location</b>: <a href="https://www.leplaza-brussels.be/en/"> |
| Le Plaza Brussels, 118 - 126 boulevard Adolphe Max, 1000 Brussels, Belgium</a></li> |
| </ul> |
| </div> |
| |
| <div class="www_sectiontitle" id="about">About</div> |
| <p>The meeting serves as a forum for LLVM, Clang, LLDB and other LLVM project |
| developers and users to get acquainted, learn how LLVM is used, and exchange |
| ideas about LLVM and its (potential) applications. <p> |
| |
| <p>The conference includes: |
| <ul> |
| <li><a href="#Keynote">Keynote</a></li> |
| <li><a href="#Tutorial">Tutorials</a></li> |
| <li><a href="#Talk">Technical talks</a></li> |
| <li><a href="#LightningTalk">Lightning talks</a></li> |
| <li><a href="#BoF">BoFs</a></li> |
| <li><a href="#Poster">Poster session</a></li> |
| <li>and a reception. </li> |
| </ul> |
| </p> |
| |
| <!-- *********************************************************************** --> |
| |
| <div class="www_sectiontitle" id="Keynote">Keynote</div> |
| <table cellpadding="10"> |
| |
| <tr><td valign="top" id="Keynote_1"> |
| <b>MLIR: Multi-Level Intermediate Representation for Compiler Infrastructure</b> |
| [ <a href="https://www.youtube.com/watch?v=qzljG6DKgic">Video</a> ] |
| [ <a href="slides/Keynote-ShpeismanLattner-MLIR.pdf">Slides</a> ]<br> |
| <i>Tatiana Shpeisman (Google), Chris Lattner (Google)</i> |
| <p>This talk will give an overview of Multi-Level Intermediate Representation - |
| a new intermediate representation designed to provide a unified, flexible and |
| extensible intermediate representation that is language-agnostic and can be |
| used as a base compiler infrastructure. MLIR shares similarities with |
| traditional CFG-based three-address SSA representations (including LLVM IR or |
| SIL), but it also introduces notions from the polyhedral domain as first class |
| concepts. The notion of dialects is a core concept of MLIR extensibility, |
| allowing multiple levels in a single representation. MLIR supports the |
| continuous lowering from dataflow graphs to high-performance target specific |
| code through partial specialization between dialects. We will illustrate in |
| this talk how MLIR can be used to build an optimizing compiler infrastructure |
| for deep learning applications.</p> |
| <p>MLIR supports multiple front- and back-ends and uses LLVM IR as one of its |
| primary code generation targets. MLIR also relies heavily on design principles |
| and practices developed by the LLVM community. For example, it depends on LLVM |
| APIs and programming idioms to minimize IR size and maximize optimization |
| efficiency. MLIR uses LLVM testing utilities such as FileCheck to ensure robust |
| functionality at every level of the compilation stack, TableGen to express IR |
| invariants, and it leverages LLVM infrastructure such as dominance analysis to |
| avoid implementing all the necessary compiler functionalities from scratch. At |
| the same time, it is a brand new IR, both more restrictive and more general |
| than LLVM IR in different aspects of its design. We believe that the LLVM |
| community will find in MLIR a useful tool for developing new compilers, |
| especially in machine learning and other high-performance domains.</p> |
| </td></tr> |
| </table> |
| |
| <div class="www_sectiontitle" id="Talk">Technical talks</div> |
| <table cellpadding="10"> |
| |
| <tr><td valign="top" id="Talk_1"> |
| <b>Switching a Linux distribution's main toolchains to LLVM/Clang</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Rosenkranzer-Switching_a_Linux_distro.pdf">Slides</a> ]<br> |
| <i>Bernhard Rosenkränzer (Linaro, OpenMandriva, LinDev)</i> |
| <p>OpenMandriva is the first general-purpose Linux distribution that has |
| switched its primary toolchain to Clang -- this talk will give an overview of |
| what we did, what problems we've faced, and where we're still having |
| problems (usually worked around by using gcc for some packages).</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_2"> |
| <b>Just compile it: High-level programming on the GPU with Julia</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Besard-Just_compile_it_high_level_programming_on_the_GPU_with_Julia.pdf">Slides</a> ]<br> |
| <i>Tim Besard (Ghent University)</i> |
| <p>High-level programming languages often rely on interpretation or compilation |
| schemes that are ill-suited for hardware accelerators like GPUs: These devices |
| typically require statically compiled, straight-line code in order to reach |
| acceptable performance. The high-level Julia programming language takes a |
| different approach, by combining careful language design with an LLVM-based JIT |
| compiler to generate high-quality machine code.</p> |
| <p>In this talk, I will show how we've used that capability to build a GPU |
| back-end for the Julia language, and explain the underlying techniques that |
| make it happen, including a high-level Julia wrapper for the LLVM libraries, |
| and interfaces to share functionality with the existing Julia code generator. I |
| will also demonstrate some of the powerful abstractions that we have built on |
| top of this infrastructure.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_3"> |
| <b>The Future of AST Matcher-based Refactoring</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Kelly-The_future_of_AST_matcherbased_refactoring.pdf">Slides</a> ]<br> |
| <i>Stephen Kelly</i> |
| <p>In the last few years, Clang has opened up new possibilities in C++ tooling |
| for the masses. Tools such as clang-tidy and clazy offer ready-to-use |
| source-to-source transformations. Available transformations can be used to |
| modernize (use newer C++ language features), improve readability (remove |
| redundant constructs), or improve adherence to the C++ Core Guidelines.</p> |
| <p>However, when special needs arise, maintainers of large codebases need to |
| learn some of the Clang APIs to create their own porting aids. The Clang APIs |
| necessarily form a more-exact picture of the structure of C++ code than most |
| developers keep in their heads, and bridging the conceptual gap can be a |
| daunting task.</p> |
| <p>This talk will show tools and features which make this task easier for |
| developers, ranging from <ul> |
| <li>Improvements to the clang-query interpreter</li> |
| <li>Improvements to the AST Matcher API</li> |
| <li>Information essential to creating clang-tidy-checks</li> |
| <li>Debugging and profiling of AST Matchers</li> |
| <li>Advanced tooling </li> |
| </ul> |
| </p> |
| <p>These features are in various stages along the way to being upstreamed to |
| Clang. They enable new possibilities for large-scale refactoring in a |
| reasonable timeframe by solving problems of API discovery, guiding users in |
| creating working refactorings.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_4"> |
| <b>A compiler approach to Cyber-Security</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Ferriere-A_compiler_approach_to_cybersecurity.pdf">Slides</a> ]<br> |
| <i>François de Ferrière (STMicroelectronics)</i> |
| <p>STMicroelectronics is developing LLVM-based compilation tools for its |
| proprietary processors and also for the ARM cores. Applications, among which an |
| increasing number of IOTs developments, require more and more security |
| implemented either in hardware or software, or both. To implement complex and |
| reliable software countermeasures that can be deployed in a timely manner, we |
| are adding specific cybersecurity code-generation features in our production |
| LLVM compiler, that we present in this talk.</p> |
| <p>We give implementation details on how we worked into Clang and LLVM to |
| implement these techniques and we explain how they contribute to reinforce the |
| software protection. We also detail how we can restrict these transformations |
| to specific safety-critical regions of a program to meet the industrial |
| constraints on performance and code size of our applications.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_5"> |
| <b>Compiler Optimizations for (OpenMP) Target Offloading to GPUs</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Doerfert-Compiler_optimization_for_OpenMP_accelerator_offloading.pdf">Slides</a> ]<br> |
| <i>Johannes Doerfert (Argonne National Laboratory), Hal Finkel (Argonne National Laboratory)</i> |
| <p>The support of OpenMP target offloading in Clang is steadily increasing. |
| However, when it comes to the optimization of such codes, LLVM is still doing a |
| horrible job. Early separation into different modules and state machine |
| generation are only two reasons why the middle and backend have a hard time |
| generating efficient code.</p> |
| <p>In this talk, we want to focus on code offloading to GPUs (through OpenMP), |
| an increasingly important part of modern programming. We will first highlight |
| different reasons for missing optimizations and poor code quality before we |
| introduce new <em>practical</em> solutions. While our implementation is still |
| experimental, early results suggest that there is enormous optimization |
| potential in both manually written, and automatically generated, target |
| offloading code.</p> |
| <p>In addition to the talk, we will, closer to the conference date, initiate a |
| discussion on the LLVM mailing list and publish our implementation.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_6"> |
| <b>Handling massive concurrency: Development of a programming model for GPU and CPU</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Liedtke-Handling_massive_concurrency.pdf">Slides</a> ]<br> |
| <i>Matthias Liedtke (SAP)</i> |
| <p>For efficient parallel execution it is necessary to write massively |
| concurrent algorithms and to optimize memory access. In this session we show |
| our approach of a programming model that is able to execute the same concurrent |
| algorithm efficiently on GPUs and CPUs: Similar to OpenMP it allows the |
| programmer to describe concurrency and memory access declaratively but hides |
| complexity like memory transfers between the CPU and the GPU. In comparison to |
| OpenMP our model provides a higher level of expressiveness which enables us to |
| reach a performance comparable to OpenCL/CUDA.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_7"> |
| <b>Automated GPU Kernel Fusion with XLA</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Joerg-Automated_GPU_Kernel_Fusion_with_XLA.pdf">Slides</a> ]<br> |
| <i>Thomas Joerg (Google)</i> |
| <p>XLA (Accelerated Linear Algebra) is an optimizing compiler for linear |
| algebra that accelerates TensorFlow computations. The XLA compiler lowers to |
| LLVM IR and relies on LLVM for low-level optimization and code generation. XLA |
| achieves significant performance gains on TensorFlow models. We observed |
| speedups of up to 3x on internal models. The popular image classification model |
| ResNet-50 trains 1.6x faster.</p> |
| <p>A key optimization performed by XLA is automated GPU kernel fusion. The idea |
| is to combine multiple linear algebra operators into a single GPU kernel to |
| reduce memory bandwidth requirements and kernel launch overhead. TensorFlow |
| with XLA demonstrated competitive performance on MLPerf benchmarks (mlperf.org) |
| compared to ML frameworks that rely on manually fused, hand-tuned GPU |
| kernels.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_8"> |
| <b>The Helium Haskell compiler and its new LLVM backend</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-de-Wolff-Helium_Haskell_compiler.pdf">Slides</a> ]<br> |
| <i>Ivo Gabe de Wolff (University of Utrecht)</i> |
| <p>Helium, developed at the University of Utrecht, is a compiler for the |
| functional, lazy language Haskell. It is used for research on error diagnosis |
| and teaching. In this talk we will however focus on the new LLVM backend and |
| the compilation of high level features like lambdas, laziness (call-by-need |
| semantics), currying (partial application). Furthermore we discuss some high |
| level optimizations which cannot be done at LLVM-level.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_9"> |
| <b>Testing and Qualification of Optimizing Compilers for Functional Safety</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Cabrelles-Testing_and_qualification_of_optimizing_compilers_for_functional_safety.pdf">Slides</a> ]<br> |
| <i>José Luis March Cabrelles (Solid Sands)</i> |
| <p>In the development of embedded applications, the compiler plays a crucial |
| role in the translation from source to machine code. If the application is |
| safety-critical, functional safety standards such as ISO 26262 for the |
| automotive industry require that the user of the compiler develops confidence |
| in the compilers correct operation. In this presentation we will discuss the |
| requirements of ISO 26262 on tools such as LLVM compilers and how they can be |
| met with a testing procedure that works well with the V-Model of |
| engineering.</p> |
| <p>As the name implies, functional safety standards deal with specified |
| functionality of components. But what about the optimizations that a LLVM-based |
| compiler applies to the program, sometimes even silently? Optimizations are not |
| even mentioned in the language standards for C and C++ - they are |
| "non-functional" behavior of the compiler. As we will demonstrate, |
| ignoring optimizations will lead to significant holes in the compiler's |
| test coverage. We will show how we have developed a technique that achieves |
| good results with optimization testing and have some errors in Intel's |
| well-regarded Clang-based compiler to show. To show the completeness of our |
| method for the requirements of functional safety, we have analyzed how the |
| tests match with the various LLVM IR-level transformation passes that they go |
| through.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_10"> |
| <b>Improving Debug Information in LLVM to Recover Optimized-out Function Parameters</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Prica-Improving_LLVM_DebugInfo.pdf">Slides</a> ]<br> |
| <i>Nikola Prica (RT-RK), Djordje Todorovic (RT-RK), Ananthakrishna Sowda (CISCO), Ivan Baev (CISCO)</i> |
| <p>Software release products are compiled with optimization level -O2 and |
| higher. Such products might produce a core-file that is used for investigating |
| cause of problem that produced it. First thing from which we start debug |
| analysis is call-trace from a crash. In such traces most of the parameters are |
| reported as optimized out due to variety of reasons. Some of parameters are |
| really optimized out, but some of their locations could be calculated. Expert |
| software developers are able to find what values parameters had at function |
| entry point by using the technique that requires searching those values in |
| disassembly of caller frame at place of that particular function call. |
| Automation of such technique is described by DWARF 5 specifications and it is |
| already implemented in GCC and GDB since 2011. The goal of this paper is to |
| present ideas, implementation and problems that we encountered while we were |
| working on this feature in LLVM. We will also show the improvement by |
| presenting recovered parameters in some of the call-traces. This feature should |
| improve debugging of optimized code built with LLVM by recovering optimized-out |
| function parameters.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_11"> |
| <b>LLVM IR in GraalVM: Multi-Level, Polyglot Debugging with Sulong</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Kreindl-LLVM_IR_in_GraalVM.pdf">Slides</a> ]<br> |
| <i>Jacob Kreindl (Johannes Kepler University Linz)</i> |
| <p>Sulong is an execution engine for LLVM bitcode that has support for |
| debugging programs at the level of source code as well as textual LLVM IR. It |
| is part of GraalVM, a polyglot virtual machine that can also execute programs |
| written in multiple dynamic programming languages such as Ruby and Python. |
| Sulong supports GraalVM's language-agnostic tooling interface to provide a |
| rich debugging experience to developers. This includes source-level debugging |
| of native extensions compiled to LLVM bitcode and the dynamic language programs |
| that use them, together in the same debugger session and front-end. Sulong also |
| enables developers to debug programs at the level of LLVM IR, including |
| stepping through the textual IR and inspecting the symbols it contains.</p> |
| <p>In this talk we will describe different ways GraalVM enables users to debug |
| programs that were compiled to LLVM bitcode. We will introduce the general |
| features of GraalVM-based debuggers by demonstrating source-level debugging of |
| a standalone C/C++ application. Building on this we will showcase GraalVM's |
| ability to provide a truly integrated debugging experience for native |
| extensions of dynamic language programs to users. We will further demonstrate |
| Sulong's support for debugging programs at the LLVM-IR level.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_12"> |
| <b>LLDB Reproducers</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Devlieghere-LLDB_reproducers.pdf">Slides</a> ]<br> |
| <i>Jonas Devlieghere (Apple)</i> |
| <p>The debugger, like the compiler, is a complex piece of software where bugs |
| are inevitable. When a bug is reported, one of the first steps in its life |
| cycle is trying reproduce the problem. Given the number of moving parts in the |
| debugger, this can be quite challenging. Especially for more sophisticated |
| problems, a small changes in the environment, the binary, its dependencies, or |
| debug information might hide the problem. Getting this right puts a heavy |
| burden on both the reporter and the developer.</p> |
| <p>Reproducers are a way to automate this process. They contains the necessary |
| information for a bug to occur again with minimal interaction from the |
| developer. For clang a reproducer consists of a script with the compiler |
| invocation and a pre-processed source file. Doing the same thing for the |
| debugger is much more complicated.</p> |
| <p>This talk discusses what was needed to have working reproducers for LLDB. It |
| goes into detail about what information was needed, how it was captured and |
| finally how the debugger uses it to reproduce an issue. The high level design |
| is addressed as well as some of the challenges, such as dealing with low-level |
| details, remote debugging, and the SB API. It concludes with an overview of |
| what is possible and what isn't.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_13"> |
| <b>Sulong: An experience report of using the "other end" of LLVM in GraalVM.</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Schatz-Sulong_an_experience_report.pdf">Slides</a> ]<br> |
| <i>Roland Schatz (Oracle Labs), Josef Eisl (Oracle Labs)</i> |
| <p>The most common use-case for LLVM is to re-use its back-end to implement a |
| compiler for new programming languages. In project Sulong, we are going a |
| different route: We use LLVM frontends, and consume the resulting bitcode. |
| Sulong is the LLVM bitcode execution engine of GraalVM, a ployglot virtual |
| machine that executes JavaScript, Python, Ruby, R, and others. The goal of |
| Sulong is to bring C, C++, Fortran, and other languages that compile to LLVM |
| bitcode into the system, and allow low-cost interoperability across language |
| borders. The latter is crucial for efficiently supporting existing native |
| interfaces of dynamic languages.</p> |
| <p>In this talk, we want to share our experience with implementing an engine |
| for executing LLVM IR in GraalVM. We will discuss how Sulong executes LLVM |
| bitcode and why this allows high-performance interoperability between |
| languages. We will show the challenges of implementing existing native |
| interfaces in new runtime environments, and how we use the different parts of |
| the LLVM project for solving them. We want to focus on situations we found |
| challenging and where we think we can contribute to the project.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_14"> |
| <b>SYCL compiler: zero-cost abstraction and type safety for heterogeneous computing</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Savonichev-SYCL_compiler.pdf">Slides</a> ]<br> |
| <i>Andrew Savonichev (Intel)</i> |
| <p>SYCL is an abstraction layer for C++, that allows a developer to write |
| heterogeneous programs in a "single source" model, where host and |
| device code are written in the same file. Utilizing modern C++ features, SYCL |
| provides a way to develop type-safe and efficient programs for various |
| accelerator devices.</p> |
| <p>Although SYCL is designed as "extension-free" standard C++ API, |
| there is a need to have some compiler extensions to enable C++ code execution |
| on accelerators. SYCL compiler is responsible for "extracting" device |
| part of code and compiling it to SPIR-V format or device native binary. In |
| addition to that, compiler should also emit auxiliary information, which is |
| used by SYCL runtime to run a device code via OpenCL API.</p> |
| <p>This talk will go over technical details of the SYCL compiler, and the |
| changes we need to make in order to bring full support for SYCL into upstream |
| LLVM and Clang as described in the RFC: |
| <a href="https://lists.llvm.org/pipermail/cfe-dev/2019-January/060811.html"> |
| https://lists.llvm.org/pipermail/cfe-dev/2019-January/060811.html</a></p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_15"> |
| <b>Handling all Facebook requests with JITed C++ code</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Guo-Zhou-Handling_all_Facebook_requests_with_JITed_C++_code.pdf">Slides</a> ]<br> |
| <i>Huapeng Zhou (Facebook), Yuhan Guo (Facebook)</i> |
| <p>Facebook needs an efficient scripting framework to enable fast iteration of |
| HTTP request handling logic in our L7 reverse proxy. A C++ scripting engine and |
| code deployment ecosystem was created to compile/link/execute C++ script at |
| run-time, using Clang and LLVM ORC APIs. The framework allows developers to |
| write business logic and unit test in C++ script, as well as debug using GDB. |
| Profiling using perf is also supported for PGO purpose. This new framework |
| outperformed another previously used scripting language by up to 4X, measured |
| in execution time.</p> |
| <p>In order to power the C++ script in ABI compatible way, a PCH (pre-compiled |
| header) is built statically to provide declarations and definitions of |
| necessary dependent types and methods. Clang APIs are then used at run-time to |
| transform source code to LLVM IR, which are later passed through LLVM ORC |
| layers for linking/optimizing. Above Clang/LLVM toolchains are statically |
| linked into main binary to ensure compatibility between PCH and C++ scripts. As |
| a result, scripts could be deployed in real time without any main binary |
| change.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_16"> |
| <b>clang-scan-deps: Fast dependency scanning for explicit modules</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Lorenz-clang-scan-deps_Fast_dependency_scanning_for_explicit_modules.pdf">Slides</a> ]<br> |
| <i>Alex Lorenz (Apple), Michael Spencer (Apple)</i> |
| <p>The dependency information that's provided by Clang can be used to |
| implement a pre-scanning phase for a build system that uses Clang modules in an |
| explicit manner, by discovering the required modules before compiling. However, |
| the traditional approach of preprocessing all sources to find the required |
| modular dependencies is typically not fast enough for a pre-scanning phase that |
| must run for every build. This talk introduces clang-scan-deps, an optimized |
| dependency discovery service that can provide speed up of up to 10X over the |
| regular preprocessor-based scanning. This talk goes into details of how this |
| service is implemented and how it can be leveraged by the build system to |
| implement a fast pre-scanning phase for explicit Clang modules.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_17"> |
| <b>Clang tools for implementing cryptographic protocols like OTRv4</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Celi-Clang_tools_for_implementing_cryptographic_protocols_like_OTRv4.pdf">Slides</a> ]<br> |
| <i>Sofia Celi (Centro de Autonomia Digital)</i> |
| <p>OTRv4 is the newest version of the Off-The-Record protocol. It is a protocol |
| where the newest academic research intertwines with real-world implementations: |
| it provides end to end encryption, and offline and online deniability for |
| interactive and non-interactive applications. As a real world protocol, it |
| needs to provide an implementation that works for real world users. For this, |
| the OTRv4 team decided to implement it in C. But as we know, working in C can |
| be challenging due to several factors.</p> |
| <p>In order to make OTRv4s implementation much safer and usable, we decided to |
| use several clang tools, such as clang format, clang tidy and address |
| sanitizers. By using these tools, we uncovered bugs, issues and problems. In |
| this talk, we aim to highlight the most interesting bugs we uncovered by using |
| these tools, by comparing the results of using static analysis and fast memory |
| error detector. We also aim to highlight the importance of using a specific |
| code formatting style, as it makes an implementation much clearer and easier to |
| find bugs. We plan to high point the importance of using these tools on real |
| world implementations that are going to be used by millions of users and that |
| aim to provide the best security properties available.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_18"> |
| <b>Implementing the C++ Core Guidelines' Lifetime Safety Profile in Clang</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Horvath-Implementing_the_C++_Core_Guidelines_Lifetime.pdf">Slides</a> ]<br> |
| <i>Gabor Horvath (Eotvos Lorand University), Matthias Gehre (Silexica GmbH), |
| Herb Sutter (Microsoft)</i> |
| <p>This is an experience report of the Clang-based implementation of Herb |
| Sutter's Lifetime safety profile for the C++ Core Guidelines, available |
| online at cppx.godbolt.org.</p> |
| <p>We will cover the kinds of diagnoses supported by the checker and how they |
| are implemented using Clang's control flow graph. We will discuss what are |
| the main problems of the current prototype and what are we going to do to fix |
| those. We also plan to discuss the upstreaming process. Some parts of the |
| analysis might end up improving existing clang warnings some of which are on by |
| default. We will also summarize early experience with performance against |
| real-world code bases, including compile time performance for LLVM sources with |
| the checker.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_19"> |
| <b>Changes to the C++ standard library for C++20</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Clow-Changes_to_the_C++_standard_library_for_C++20.pdf">Slides</a> ]<br> |
| <i>Marshall Clow (CppAlliance)</i> |
| <p>The next version of the C++ standard will almost certainly be approved next |
| year, and be called C++20. There will be many new features in the standard |
| library in C++20. Things like ranges, concepts, calendar support, and many |
| others. In this talk, I'll give an overview of the new features, and an |
| update on the status of their implementation in libc++.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_20"> |
| <b>Adventures with RISC-V Vectors and LLVM</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Kruppe-Espasa-RISC-V_Vectors_and_LLVM.pdf">Slides</a> ]<br> |
| <i>Robin Kruppe (TU Darmstadt), Roger Espasa (Esperanto Technologies)</i> |
| <p>RISC-V is a free and open instruction set architecture (ISA) with an |
| established LLVM backend and numerous open-source and proprietary hardware |
| implementations. The work-in-progress vector extension adds standardized vector |
| processing, taking lessons both from traditional long-vector machines and from |
| packed-SIMD approaches that dominated industrial designs in the past few |
| decades. The resulting architecture aims to excel at various scales, from small |
| embedded cores to large HPC accelerators and everything in between.</p> |
| <p>In this talk you will learn about the RISC-V vector ISA as well as LLVM |
| support for it: vectorizing loops without needing scalar remainder handling, |
| vectors whose length is not known at compile time, a vector unit that can be |
| dynamically reconfigured for increased efficiency, and more.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_21"> |
| <b>A Tale of Two ABIs: ILP32 on AArch64</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Northover-A_tale_of_two_ABIs_ILP32_on_AArch64.pdf">Slides</a> ]<br> |
| <i>Tim Northover (Apple)</i> |
| <p>We faced the challenge of seamlessly running 32b application binaries on a |
| new 64b S4 chip, which has no hardware support to run 32b binaries. Translating |
| the ARM binaries directly to the new hardware would be hard, but when an |
| application is available in bitcode format, the task is much more feasible. |
| This talk opens the curtain for an inside look into the decisions and steps |
| taken to translate 32b bitcode for the new 64b hardware. It will discuss the |
| many design, implementation and verification challenges of introducing a new |
| ABI, arm64_32, which guarantees that the binaries for the new S4 chip are |
| compatible to the original 32b applications.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_22"> |
| <b>LLVM Numerics Improvements</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Berg-LLVM_numerics_improvements.pdf">Slides</a> ]<br> |
| <i>Michael Berg (Apple), Steve Canon (Apple)</i> |
| <p>Some LLVM based compilers currently provide two modes of floating point code |
| generation. The first mode, called fast-math, is where performance is the |
| primary consideration over numerical precision and accuracy. This mode does not |
| strictly follow the IEEE-754 standard, but has proven useful for applications |
| that do not require this level of precision. The second mode, called |
| precise-math, is where the compiler carefully follows the subset of behavior |
| defined in the IEEE standard that is applicable to conforming hardware targets. |
| This mode is primarily used for compute workloads and wherever fast-math |
| precision is inadequate, however it runs much slower as it requires a larger |
| number of instructions in general. In practice neither of these modes is |
| particularly desirable. The fast-math mode ignores a significant portion of the |
| standard as pertains to handling undefined values described as Not a Number |
| (NaNs) and Infinities (INFs), resulting in difficulties for certain workloads |
| when the hardware target computes these values correctly and performance |
| remains critical.</p> |
| <p>Until recently these two models were mutually exclusive, however with the |
| addition of IR flags they need not be. For instance, the FastMath metadata |
| module flag drives behavior deemed numerically unsafe when it is enabled, by |
| indiscriminately enabling optimizations. With IR flags this behavior can be |
| enabled with much finer granularity, allowing various code forms to be fast or |
| precise together in one module. We call this mixed mode compilation. IR flags |
| can be used individually or paired to produce desired floating point behavior |
| under specified constraints with fine granularity of control. Optimization |
| passes have been modified under this new kind of control to produce this |
| behavior. This talk will describe the recent numerics work and discuss the |
| implications for front-ends and backends built with LLVM.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_23"> |
| <b>DOE Proxy Apps: Compiler Performance Analysis and Optimistic Annotation Exploration</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Homerding-DOE_proxy_apps_compiler_performance_analysis_and_optimistic_annotation_exploration.pdf">Slides</a> ]<br> |
| <i>Brian Homerding (Argonne National Laboratory), Johannes Doerfert (Argonne National Laboratory)</i> |
| <p>The US Department of Energy proxy applications are simplified models of the |
| key components of various scientific computing workloads. These proxy |
| applications are useful for research and exploration in many areas, including |
| software technology. We have conducted performance analysis of these proxy |
| application using Clang, GCC and some vendor compilers. These results have |
| identified and motivated our work on modelling the memory access of math |
| functions in Clang. We will discuss our design and our work to expose this |
| ability to encode function information to the developer. Additionally in this |
| area, I will then discuss my collaboration on a development tool designed to |
| explore both the potential performance gap lost from knowledge the developer |
| could encode (but did not) and the extent to which LLVM is able to profitably |
| make use of this information.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Talk_24"> |
| <b>Loop Fusion, Loop Distribution and their Place in the Loop Optimization Pipeline</b> |
| [ Video ] |
| [ <a href="slides/TechTalk-Barton-Loop_fusion_loop_distribution_and_their_place_in_the_loop_optimization_pipeline.pdf">Slides</a> ]<br> |
| <i>Kit Barton (IBM), Johannes Doerfert (Argonne National Lab), Hal Finkel |
| (Argonne National Lab), Michael Kruse (Argonne National Lab)</i> |
| <p>Loop fusion and loop distribution are two key optimizations that typically |
| are featured prominently in a loop optimization pipeline. They are used both to |
| improve performance of applications and also to enable other loop |
| optimizations. For example, loop fusion can improve the performance of |
| applications through increasing temporal data cache locality. It can also |
| increase the scope of other optimizations by creating larger loop nests for |
| intra-loop nest optimizations to work on. Similarly, loop distribution is often |
| used to improve performance directly by distributing loops that exceed hardware |
| resources (e.g., register pressure). It is also frequently used to distribute |
| loops containing loop-carried dependencies into two loops: one with loop |
| carried dependencies and the second with no loop carried dependencies; this |
| enables other optimizations (e.g., vectorization) on the independent loop. |
| Furthermore, these two optimizations can work nicely together, as they have the |
| ability to "undo" transformations done by the other. Thus, the |
| implementation of both of these optimizations must be robust as they can both |
| play an important role in a loop optimization pipeline.</p> |
| <p>This talk will be a follow-on to "Revisiting Loop Fusion, and its place |
| in the loop transformation framework", presented at the 2018 LLVM |
| Developers' Meeting. The patch to implement basic loop fusion described in |
| the talk is currently undergoing review on phabricator (<a |
| href="https://reviews.llvm.org/D55851">https://reviews.llvm.org/D55851</a>). We |
| have prototypes to make loop fusion more aggressive by moving code from between |
| two loops (making them adjacent) that will be posted for review once the basic |
| loop fusion patch is accepted. We also have plans to peel loops to (to make |
| their bounds conform), and improve the dependence analysis between the two loop |
| bodies. This talk will also include findings from our current analysis of the |
| loop distribution pass in LLVM. It will provide a summary of the strengths and |
| limitations of loop distribution, and summarize any improvements that are made |
| prior to EuroLLVM 2019. Finally, the presentation will discuss how loop fusion |
| and loop distribution can fit into the existing loop optimization pipeline in |
| LLVM.</p> |
| </td></tr> |
| </table> |
| |
| <div class="www_sectiontitle" id="Tutorial">Tutorials</div> |
| <table cellpadding="10"> |
| |
| <tr><td valign="top" id="Tutorial_1"> |
| <b>Tutorial: Building a Compiler with MLIR</b> |
| [ <a href="https://www.youtube.com/watch?v=cyICUIZ56wQ">Video</a> ] |
| [ <a href="slides/Tutorial-AminiVasilacheZinenko-MLIR.pdf">Slides</a> ]<br> |
| <i>Amini Mehdi (Google), Nicolas Vasilache (Google), Alex Zinenko (Google)</i> |
| <p>This tutorial will complement the technical talk about MLIR. We will |
| implement a custom DSL for numerical processing and walk the audience |
| step-by-step through the use of MLIR to support the lowering and the |
| optimization of such DSL, and target LLVM for lower level optimizations and |
| code generation or JIT execution.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Tutorial_2"> |
| <b>Building an LLVM-based tool: lessons learned</b> |
| [ Video ] |
| [ <a href="slides/Tutorial-Denisov-Building_an_LLVM_based_tool.pdf">Slides</a> ]<br> |
| <i>Alex Denisov</i> |
| <p>In this talk, I want to share my experience in building an LLVM-based tool.</p> |
| <p>For the last three years, I work on a tool for mutation testing. Currently, |
| it works on Linux, macOS, and FreeBSD and the source code is compatible with |
| any LLVM version between 3.9 and 7.0. Anything that can run in parallel - |
| runs in parallel. I will cover the following topics:<ul> |
| <li>Build system: on supporting multiple LLVM versions and building against |
| sources or precompiled binary.</li> |
| <li>Parallelization: which parts of the tool can be parallelized and which |
| should run in one thread</li> |
| <li>Testing: how to build robust test suite for the tool</li> |
| <li>Bitcode: on several ways to convert a program into LLVM bitcode, that can |
| be used by the tool.</li> |
| </ul></p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Tutorial_3"> |
| <b>LLVM IR Tutorial - Phis, GEPs and other things, oh my!</b> |
| [ Video ] |
| [ <a href="slides/Tutorial-Bridgers-LLVM_IR_tutorial.pdf">Slides</a> ]<br> |
| <i>Vince Bridgers (Intel Corporation), Felipe de Azevedo Piovezan (Intel Corporation)</i> |
| <p>LLVM intermediate representation (IR) is the abstract description machine |
| operations used to translate LLVM front ends to a form that's executable by |
| a target machine. Optimizations and transformations are performed on the IR by |
| the LLVM library to create executable images. This tutorial will introduce the |
| IR syntax, describe basic tools for manipulating IR formats, and describe |
| mappings of IR from various common source code control structures. Tutorial |
| materials with specific examples will be made available for the tutorial |
| presentation, and for offline review.</p> |
| </td></tr> |
| </table> |
| |
| <div class="www_sectiontitle" id="SRC">Student Research Competition</div> |
| <table cellpadding="10"> |
| |
| <tr><td valign="top" id="SRC_1"> |
| <b>Safely Optimizing Casts between Pointers and Integers</b> |
| [ Video ] |
| [ <a href="slides/SRC-Lee-Safely_optimizing_casts_between_pointers_and_integers.pdf">Slides</a> ]<br> |
| <i>Juneyoung Lee (Seoul National University, Korea), Chung-Kil Hur (Seoul |
| National University, Korea), Ralf Jung (MPI-SWS, Germany), Zhengyang Liu |
| (University of Utah, USA), John Regehr (University of Utah, USA), Nuno P. |
| Lopes (Microsoft Research, UK)</i> |
| <p>In this talk, a list of optimizations that soundly removes casts between |
| pointers and integers will be presented. In LLVM, a pointer is more than just |
| an integer: LLVM allows a pointer to track its underlying object, and the rule |
| to find it is defined as based-on relation. This allows LLVM to aggressively |
| optimize load/stores, but makes the meaning of pointer-integer casts |
| complicated. This causes conflict between existing optimizations, causing |
| long-standing miscompilation bugs like 34548.</p> |
| <p>To fix it, we suggest disabling folding of inttoptr(ptrtoint(p)) to p and |
| using a safe workaround to remove them. This optimization is important because |
| it's removing a significant portion of such cast pairs. We'll show that |
| even if the optimization is disabled, majority of casts can be removed by |
| carefully adding new \& modifying existing optimizations. After the |
| updates, the performance is still comparable to the original LLVM.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="SRC_2"> |
| <b>An alternative OpenMP Backend for Polly</b> |
| [ Video ] |
| [ <a href="slides/SRC-Halkenhauser-An_alternative_OpenMP_backend_for_Polly.pdf">Slides</a> ]<br> |
| <i>Michael Halkenhäuser (TU Darmstadt)</i> |
| <p>LLVM's polyhedral infrastructure framework Polly may automatically |
| exploit thread-level parallelism through OpenMP. Currently, the user can only |
| influence the number of utilized threads, while other OpenMP parameters such as |
| the scheduling type and chunk size are set to fixed values. This in turn, |
| limits a user's ability to adapt the optimization process for a given |
| problem.</p> |
| <p>In this work, we present an alternative OpenMP backend for Polly, which |
| provides additional customization options to the user and is based on the LLVM |
| OpenMP runtime. We evaluate our new backend and the influence of the new |
| customization options on performance and compare to Polly's existing OpenMP |
| backend.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="SRC_3"> |
| <b>Implementing SPMD control flow in LLVM using reconverging CFGs</b> |
| [ Video ] |
| [ <a href="slides/SRC-Wahlster-Implementing_SPMD_control_flow_in_LLVM_using_reconverging_CFGs.pdf">Slides</a> ]<br> |
| <i>Fabian Wahlster (Technische Universität München), Nicolai |
| Hähnle (Advanced Micro Devices)</i> |
| <p>Compiling programs for an SPMD execution model, e.g. for GPUs or for whole |
| program vectorization on CPUs, requires a transform from the thread-level input |
| program into a vectorized wave-level program in which the values of the |
| original threads are stored in corresponding lanes of vectors. The main |
| challenge of this transform is handling divergent control flow, where threads |
| take different paths through the original CFG. A common approach, which is |
| currently taken by the AMDGPU backend in LLVM, is to first structurize the |
| program as a simplification for subsequent steps.</p> |
| <p>However, structurization is overly conservative. It can be avoided when |
| control flow is uniform, i.e. not divergent. Even where control flow is |
| divergent, structurization is often unnecessary. Moreover, LLVM's |
| StructurizeCFG pass relies on region analysis, which limits the extent to which |
| it can be evolved.</p> |
| <p>We propose a new approach to SPMD vectorization based on saying that a CFG |
| is reconverging if for every divergent branch, one of the successors is a |
| post-dominator. This property is weaker than structuredness, and we show that |
| it can be achieved while preserving uniform branches and inserting fewer new |
| basic blocks than structurization requires. It is also sufficient for code |
| generation, because it guarantees that threads which "leave" a wave |
| at divergent branches will be able to rejoin it later.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="SRC_4"> |
| <b>Function Merging by Sequence Alignment</b> |
| [ Video ] |
| [ <a href="slides/SRC-Rocha-Function_merging_by_sequence_alignment.pdf">Slides</a> ]<br> |
| <i>Rodrigo Rocha (University of Edinburgh), Pavlos Petoumenos (University of |
| Edinburgh), Zheng Wang (Lancaster University), Murray Cole (University of |
| Edinburgh), Hugh Leather (University of Edinburgh)</i> |
| <p>Resource-constrained devices for embedded systems are becoming increasingly |
| important. In such systems, memory is highly restrictive, making code size in |
| most cases even more important than performance. Compared to more traditional |
| platforms, memory is a larger part of the cost and code occupies much of it. |
| Despite that, compilers make little effort to reduce code size. One key |
| technique attempts to merge the bodies of similar functions. However, |
| production compilers only apply this optimization to identical functions, while |
| research compilers improve on that by merging the few functions with identical |
| control-flow graphs and signatures. Overall, existing solutions are |
| insufficient and we end up having to either increase cost by adding more memory |
| or remove functionality from programs.</p> |
| <p>We introduce a novel technique that can merge arbitrary functions through |
| sequence alignment, a bioinformatics algo- rithm for identifying regions of |
| similarity between sequences. We combine this technique with an intelligent |
| exploration mechanism to direct the search towards the most promising function |
| pairs. Our approach is more than 2.4x better than the state-of-the-art, |
| reducing code size by up to 25%, with an overall average of 6%, while |
| introducing an average compilation-time overhead of only 15%. When aided by |
| profiling information, this optimization can be deployed without any |
| significant impact on the performance of the generated code.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="SRC_5"> |
| <b>Compilation and optimization with security annotations</b> |
| [ Video ] |
| [ <a href="slides/SRC-TuanVu-Compilation_and_optimization_with_security_annotations.pdf">Slides</a> ]<br> |
| <i>Son Tuan Vu (LIP6), Karine Heydemann (LIP6), Arnaud de Grandmaison (ARM), |
| Albert Cohen (Google)</i> |
| <p>Program analysis and program transformation systems need to express |
| additional program properties, to specify test and verification goals, and to |
| enhance their effectiveness. Such annotations are typically inserted to the |
| representation on which the tool operates; e.g., source level for establishing |
| compliance with a specification, and binary level for the validation of secure |
| code. While several annotation languages have been proposed, these typically |
| target the expression of functional properties. For the purpose of implementing |
| secure code, there has been little effort to support non-functional properties |
| about side-channels or faults. Furthermore, analyses and transformations making |
| use of such annotations may target different representations encountered along |
| the compilation flow.</p> |
| <p>We extend an annotation language to express a wider range of functional and |
| non-functional properties, enabling security-oriented analyses and influencing |
| the application of code transformations along the compilation flow. We |
| translate this language to the different compiler representations from abstract |
| syntax down to binary code. We explore these concepts through the design and |
| implementation of an optimizing, annotation-aware compiler, capturing |
| annotations from the program source, propagating and emitting them in the |
| binary, so that binary-level analysis tools can use them.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="SRC_6"> |
| <b>Adding support for C++ contracts to Clang</b> |
| [ Video ] |
| [ <a href="slides/SRC-Lopez-Gomez-Adding_support_for_C++_contracts_to_clang.pdf">Slides</a> ]<br> |
| <i>Javier López-Gómez (University Carlos III of Madrid), J. |
| Daniel García (University Carlos III of Madrid)</i> |
| <p>A language supporting contract-checking allows to detect programming errors. |
| Also, making this information available to the compiler may cause it to perform |
| additional optimizations.</p> |
| <p>This paper presents our implementation of the P0542R5 technical |
| specification (now part of the C++20 working draft).</p> |
| </td></tr> |
| </table> |
| |
| <div class="www_sectiontitle" id="LightningTalk">Lightning talks</div> |
| <table cellpadding="10"> |
| |
| <tr><td valign="top" id="LightningTalk_1"> |
| <b>LLVM IR Timing Predictions: Fast Explorations via lli</b> |
| [ Video ] |
| [ Slides ]<br> |
| <i>Alessandro Cornaglia (FZI - Research Center for Information Technology)</i> |
| <p>Many applications, especially in the embedded domain, have to be executed on |
| different hardware target platforms. For these applications, it is necessary to |
| evaluate both functional and non-functional properties, such as software |
| execution time, in all their hardware/software combinations. Especially in the |
| context of software product line engineering, it is not feasible to test all |
| variants one-by-one. The intermediate representation of the source code offers |
| an attractive opportunity for a single-run analysis, because it covers the |
| software variability, while at the same time omitting the hardware-dependent |
| optimizations.</p> |
| <p>We present an extension for the LLVM IR execution engines, which are part of |
| the LLVM lli tool. The extension evaluates on the fly functional and |
| non-functional properties for all the hardware variants during one lli |
| execution. In particular, our extension is designed for the evaluation of the |
| execution time of a program for multiple target platforms considering different |
| software variants. Both the interpreter and JIT execution modes are supported. |
| Prospectively, our approach will be enriched with multiple analysis techniques. |
| Thanks to our approach, it is now possible to evaluate software variants with |
| regard to multiple hardware platforms in a single lli execution run.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="LightningTalk_2"> |
| <b>Simple Outer-Loop-Vectorization == LoopUnroll-And-Jam + SLP</b> |
| [ Video ] |
| [ <a href="slides/Lightning-Das-AMD_OLV.pdf">Slides</a> ]<br> |
| <i>Dibyendu Das (AMD)</i> |
| <p>In this brief talk I will show how Outer-Loop-Vectorization (OLV), which is |
| of great interest to the LLVM community, can be visualized as a combination of |
| two transformations applied to a loop-nest of interest. These two |
| transformations are LoopUnrollAndJam and SLP. LoopUnrollAndJam is a fairly new |
| addition to the LLVM loop-optimization repertoire. Combined with a fairly |
| powerful SLP that LLVM supports today, we are able to vectorize the outer loop |
| of several important kernels automatically without the support of any pragma. |
| At present our implementation is at the level of a PoC and does not exploit any |
| rigorous costing mechanism. While we understand that OLV is being implemented |
| in the LoopVectorizer using the VPlan technique, this paper highlights a quick |
| and cheap way to solve the same problem in a different manner using two |
| existing transforms.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="LightningTalk_3"> |
| <b>Clacc 2019: An Update on OpenACC Support for Clang and LLVM</b> |
| [ Video ] |
| [ <a href="slides/Lightning-Denny-clacc.pdf">Slides</a> ]<br> |
| <i>Joel E. Denny (Oak Ridge National Laboratory), Seyong Lee (Oak Ridge |
| National Laboratory), Jeffrey S. Vetter (Oak Ridge National Laboratory)</i> |
| <p>We are developing production-quality, standard-conforming OpenACC [1] |
| compiler and runtime support in Clang and LLVM for the US Exascale Computing |
| Project [2][3]. A key strategy of Clacc's design is to translate OpenACC |
| to OpenMP in order to leverage Clang's existing OpenMP compiler and runtime |
| support and to minimize implementation divergence. To maximize reuse of the |
| OpenMP implementation and to facilitate research and development into new |
| source-level tools involving both the OpenACC and OpenMP levels, Clacc |
| implements this translation in the Clang AST using Clang's TreeTransform |
| facility. However, we are also following LLVM IR parallel extensions being |
| developed by the community as a path to improve compiler optimizations and |
| analyses.</p> |
| <p>The purpose of this talk is to provide an update on Clacc progress over the |
| preceding year including early performance results, to present the plan for the |
| year ahead, and to invite participation from others. Clacc's OpenACC |
| support is still maturing and we have not yet offered it upstream. However, we |
| have already upstreamed many mutually beneficial improvements from the Clacc |
| project, including improvements to LLVM's testing infrastructure and to |
| Clang and its OpenMP support. This talk will summarize those contributions as |
| well.</p> |
| <p>[1] OpenACC standard: <a href="https://www.openacc.org/">https://www.openacc.org/</a> </p> |
| <p>[2] Clacc: Translating OpenACC to OpenMP in Clang. Joel E. Denny, Seyong |
| Lee, and Jeffrey S. Vetter. 2018 IEEE/ACM 5th Workshop on the LLVM Compiler |
| Infrastructure in HPC (LLVM-HPC), Dallas, TX, USA, (2018).</p> |
| <p>[3] Clacc: OpenACC Support for Clang and LLVM. Joel E. Denny, Seyong Lee, |
| and Jeffrey S. Vetter. 2018 European LLVM Developers Meeting (EuroLLVM |
| 2018).</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="LightningTalk_4"> |
| <b>Targeting a statically compiled program repository with LLVM</b> |
| [ Video ] |
| [ <a href="slides/Lightning-Camp-Program_Repo.pdf">Slides</a> ]<br> |
| <i>Phil Camp (SN Systems), Russell Gallop (SN Systems)</i> |
| <p>Following on from the 2016 talk "Demo of a repository for statically |
| compiled programs", this lightning talk will present a brief overview of |
| how LLVM was modified to target a program repository. This includes adding a |
| new target output format and a new optimization pass to skip program elements |
| already present in the repository. Reference: <a |
| href="https://github.com/SNSystems/llvm-prepo">https://github.com/SNSystems/llvm-prepo</a></p> |
| </td></tr> |
| |
| <tr><td valign="top" id="LightningTalk_5"> |
| <b>Does the win32 clang compiler executable really need to be over 21MB in size?</b> |
| [ Video ] |
| [ <a href="slides/Lightning-Gallop-Does_win32_clang_compiler.pdf">Slides</a> ]<br> |
| <i>Russell Gallop (SN Systems), Greg Bedwell (SN Systems)</i> |
| <p>The title of this lighting talk is from a bug filed in the early days of the |
| PS4 compiler. It noted that the LLVM-based PS4 compiler was more than 3 times |
| larger than the PS3 compiler. Since then it has almost doubled to over 40MB. |
| For a compiler which targets one system this seems excessive. Executable size |
| can cost in worse cache performance and cost time if transferring for |
| distributed builds.</p> |
| <p>In this lightning talk I will look at where this comes from and how it can |
| be managed.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="LightningTalk_6"> |
| <b>Resolving the almost decade old checker dependency issue in the Clang Static Analyzer</b> |
| [ Video ] |
| [ <a href="slides/Lightning-Umann-Resolving_the_almost.pdf ">Slides</a> ]<br> |
| <i>Kristóf Umann (Ericsson Hungary, Eötvös Loránd University)</i> |
| <p>As checkers grew in numbers in the Static Analyzer, the problem of certain |
| checkers depending on one another was inevitable. One particular problem, for |
| example, is that a checker called MallocChecker, which despite its name does |
| all sorts of memory allocation and de- or reallocation related checks, depends |
| on CStringChecker to model calls to strcmp. While these checkers are completely |
| separate entities, the Static Analyzer also contains large checker classes that |
| in fact expose multiple checkers to the user: For example, IteratorChecker has |
| a modeling part, and it exposes 3 iterator related checkers, and enabling any |
| of the three will also enable the unexposed modeling part. Having both of these |
| structures makes it difficult to find a solution where the developer (or the |
| experienced user) can easily see what checkers are enabled, as these |
| dependencies are only expressed in the implementation.</p> |
| <p>This talk is going to discuss elegant solutions as to how these rather |
| fragile checker structures can be preserved by declaring these dependencies in |
| TableGen files, how checker developers (and users) can ensure that when the |
| analyzer is invoked, only the requested checkers will be enabled, and also take |
| a very brief look at what other features the analyzer gained thanks to these |
| issues being resolved.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="LightningTalk_7"> |
| <b>Adopting LLVM Binary Utilities in Toolchains</b> |
| [ Video ] |
| [ <a href="slides/Lightning-Rupprecht-Adopting_LLVM_binary_utilities.pdf">Slides</a> ]<br> |
| <i>Jordan Rupprecht (Google)</i> |
| <p>Although many projects have migrated from GCC-based toolchains to |
| Clang-based ones, tools from the GNU Binutils collection are still widely used |
| despite having equivalents in the LLVM project. The problems faced when |
| attempting to use LLVM tools range anywhere from simple command line syntax |
| differences to unimplemented or buggy features. In this talk, I will describe |
| some of the types of challenges we faced when adopting LLVM tools, as well as |
| some of the strategies we used to test the toolchain.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="LightningTalk_8"> |
| <b>Multiplication and Division in the Range-Based Constraint Manager</b> |
| [ Video ] |
| [ <a href="slides/Lightning-Balogh-Multiplication_and_Division_in_the_Range-Based_Constraint_Manager.pdf">Slides</a> ]<br> |
| <i>Ádám Balogh (Ericsson Hungary Ltd.)</i> |
| <p>The default constraint manager of the Clang Static Analyzer is a simple |
| range-based constraint manager: it stores and manages the valid ranges for the |
| values of symbolic expressions. Upon new assumptions it further constrains |
| these ranges which often results in an empty range which tells the analyzer |
| that the assumption is impossible. Until now the constraint manager could |
| handle basic assumptions: A <rel> m, A + n <rel> m and A - n |
| <rel> m where A is a symbolic expression, n and m integer constants and |
| <rel> a relational operator. In the latter two cases where a constant is |
| added or subtracted from the symbolic expression the range of the additive |
| expression is calculated by adjusting the range circularly by the constant. |
| However, it could not cope with division and multiplication, thus not even the |
| range for A*2 could be deduced from the range of A. This shortcoming lead to |
| both false positives and missed true positives.</p> |
| <p>To improve the true positive/false positive ratio of the analyzer we |
| extended the range-based constraint manager to be able to handle expressions of |
| the format A <mul> k <add> n <rel> m, where A is a symbolic |
| expression, k, m and n integer constants, <mul> a multiplicative operator |
| (* or /), <add> an additive operator (+ or -) and <rel> a |
| relational operator. The main challenge in our work was to correctly scale the |
| ranges in the circular arithmetic: for example in case of signed 8 bit types in |
| A * 2 == 56 the value of A could not only be 28, but also -100. Similarly, in A |
| / 3 == 4 the value of A is not necessarily 12, but anything in range [12..14]. |
| To ensure full correctness we also proved our solution: first we generated |
| every range for every constants in both the 8 bit signed and unsigned |
| arithmetic, then we tested whether the scaling algorithm calculates exactly the |
| same ranges. Finally we extrapolated this algorithm to wider integer types and |
| ported it to the range-based constraint manager. According to our measurements |
| there is no significant change in the performance and in the talk we will |
| present numbers of lost false positives and new true positives.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="LightningTalk_9"> |
| <b>Statistics Based Checkers in the Clang Static Analyzer</b> |
| [ Video ] |
| [ <a href="slides/Lightning-Balogh-Statistics_based_checkers_in_the_Clang_Static_Analyzer.pdf">Slides</a> ]<br> |
| <i>Ádám Balogh (Ericsson Hungary Ltd.)</i> |
| <p>In almost every development project there are some conventions that the |
| return value of some functions in an external library must be compared to some |
| extremal value, such as zero. For example, many integer functions return |
| negative number in case of error similarly to pointer functions returning null |
| pointers. In a large project with many external functions it is virtually |
| impossible to formalize all these rules explicitly: they are either unwritten |
| or only exist in a natural language. To help enforcing these rules, we created |
| checkers in the Clang Static Analyzer to explore these rules on statistical |
| base and check the code for them. We currently support two kinds of extremal |
| values: negative numbers for functions returning integers and null pointers for |
| functions returning pointers.</p> |
| <p>Example:</p> |
| <p>int i = may_return_return_negative();</p> |
| <p>v[i]; // error: negative indexing</p> |
| <p>Exploration and checking for these rules happens in two phases: in the first |
| phase we check every function call and create a summary for each function |
| recording the percentage the return value is checked for negativeness (integer |
| functions) or nullness (pointer functions). If this percentage is above a |
| defined threshold (85% by default) we assume that the rule for the function |
| exists. The second phase is the usual execution of the analyzer where a checker |
| checks the code for violations of the rule: it splits the execution path to two |
| branches at the call of the listed functions, where the return value in one |
| branch is an extremal value (negative for integers or null for pointers) and |
| non-extremal value on the other branch. Other checkers (e.g. the null-pointer |
| dereference checker) are expected to find errors on the extremal-value branch |
| if they are not terminated in the code by checking for the extremal-value. The |
| performance impact of the state-split is low: in at least 85% of the cases the |
| extremal-value branch is terminated quickly, in the remaining cases we expect |
| another checker to create a sink-node because of an error. The new checker is |
| under evaluation on open-source projects. We found some false positives, |
| however their amount can be reduced by involving the arguments into the |
| statistics.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="LightningTalk_10"> |
| <b>Flang Update</b> |
| [ Video ] |
| [ <a href="slides/Lightning-Scalpone-Flang_update.pdf">Slides</a> ]<br> |
| <i>Steve Scalpone (NVIDA / PGI / Flang)</i> |
| <p>An update about the current state of Flang, including a report on OpenMP 4.5 |
| target offload, Fortran performance and the new f18 front end.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="LightningTalk_11"> |
| <b>Swinging Modulo Scheduling together with Register Allocation</b> |
| [ Video ] |
| [ Slides ]<br> |
| <i>Lama Saba (Intel)</i> |
| <p>VLIW architectures rely heavily on Modulo Scheduling to optimize ILP in |
| loops. Modulo Scheduling can be achieved today in LLVM using the |
| MachinePipeliner pass, which implements a Swing Modulo Scheduler prior to |
| register allocation [1]. For some VLIW architectures, such as those lacking |
| hardware interlocks or the ability to spill registers onto a stack, the |
| MachinePipeliner's decisions become crucial for the success of the register |
| allocation phase, since they affect the latter's decisions to generate |
| splits or spills, which in turn can result in an inefficient or even an |
| unsuccessful resource allocation.</p> |
| <p>Nevertheless, even though the MachinePipeliner aims to schedule with a |
| minimal Initiation Interval, it is structured in a way that facilities trying |
| larger Initiation Intervals or a different ordering, this structure lends |
| itself to alternative, possibly less aggressive scheduling retries, after more |
| aggressive attempts have failed in register allocation.</p> |
| <p>This talk introduces this issue and explores how we can achieve successful |
| modulo scheduling and register allocation for such architectures in LLVM by |
| introducing a repetitive rollback-and-retry mechanism for altering scheduling |
| decisions based on the register allocator's outcome, and how we can |
| leverage such an approach to improve the scheduling of VLIW architectures in |
| general.</p> |
| <p>[1] An Implementation of Swing Modulo Scheduling in a Production Compiler - |
| Brendon Cahoon - <a |
| href="http://llvm.org/devmtg/2015-10/slides/Cahoon-SwingModuloScheduling.pdf"> |
| http://llvm.org/devmtg/2015-10/slides/Cahoon-SwingModuloScheduling.pdf</a></p> |
| </td></tr> |
| |
| <tr><td valign="top" id="LightningTalk_12"> |
| <b>LLVM for the Apollo Guidance Computer</b> |
| [ Video ] |
| [ Slides ]<br> |
| <i>Lewis Revill (University of Bath)</i> |
| <p>Nearly 50 years ago on the 20th of July 1969 humans set foot on the moon for |
| the first time. Among the many extraordinary engineering feats that made this |
| possible was the Apollo Guidance Computer, an innovative processor for its time |
| with an instruction set that was thought up well before the advent of C. So 50 |
| years later, why not implement support for it in a modern compiler such as |
| LLVM?</p> |
| <p>This talk will give a brief overview of some of the architectural features |
| of the Apollo Guidance Computer followed by an account of my implementation of |
| an LLVM target so far. The shortcomings of LLVM when it comes to implementing |
| such an unusual architecture will be discussed along with the workarounds used |
| to overcome them.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="LightningTalk_13"> |
| <b>Catch dangling inner pointers with the Clang Static Analyzer</b> |
| [ Video ] |
| [ <a href="slides/Lightning-Kovacs-Dangling_pointer_checker.pdf">Slides</a> ]<br> |
| <i>Réka Kovács (Eötvös Loränd University)</i> |
| <p>C++ container classes provide methods that return a raw pointer to the |
| container's inner buffer. When the container is destroyed, the inner buffer |
| is deallocated. A common bug is to use such a raw pointer after deallocation, |
| which may lead to crashes or other unexpected behavior.</p> |
| <p>This lightning talk will present a new Clang Static Analyzer checker |
| designed to address the above described problems, implemented last year as a |
| Google Summer of Code project. The checker has found serious problems in |
| popular open source projects with a negligible false positive rate. Future |
| plans include adding support for view-like constructs and non-STL |
| containers.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="LightningTalk_14"> |
| <b>Cross translation unit test case reduction</b> |
| [ Video ] |
| [ <a href="slides/Lightning-Kovacs-Cross_TU_reduction.pdf">Slides</a> ]<br> |
| <i>Réka Kovács (Eötvös Loränd University)</i> |
| <p>C-Reduce, released by Regehr et al. in 2012, is an excellent tool designed |
| to generate a minimal test case from a C/C++ file that has some specific |
| property (e.g. triggers a bug). One of the most interesting parts of C-Reduce |
| is Clang Delta, which is a set of compiler-like transformations implemented |
| using Clang libraries. Clang Delta includes transformations like changing a |
| function parameter to a global variable etc. </p> |
| <p>With the introduction of the experimental cross translation unit analysis |
| feature in the Clang Static Analyzer, there arose a need to investigate |
| crashes, bugs, or false positive reports that spread across different |
| translation units. Unfortunately, C-Reduce was designed to minimize one |
| translation unit at a time, and some of the Clang Delta transformations cannot |
| be applied to multiple TUs in their original form.</p> |
| <p>This talk/poster is a status report about a work in progress that aims to |
| make it possible to use C-Reduce for cross translation unit test case |
| reduction.</p> |
| </td></tr> |
| </table> |
| |
| <div class="www_sectiontitle" id="BoF">BoFs</div> |
| <table cellpadding="10"> |
| |
| <tr><td valign="top" id="BoF_1"> |
| <b>RFC: Towards Vector Predication in LLVM IR</b><br> |
| <i>Simon Moll (Saarland University), Sebastian Hack (Saarland University)</i> |
| <p>In this talk, we present the current state of the Explicit Vector Length |
| extension for LLVM. EVL is the first step towards proper predication and active |
| vector length support in LLVM IR. There has been a recent surge in vector ISAs, |
| let it be the RISC-V V extension, ARM SVE or NEC SX-Aurora, all of which pose |
| new demands to LLVM IR. Among their novel features are an active vector length, |
| full predication on all vector instructions and a register length that is |
| unknown at compile time. In this talk, we present the Explicit Vector Length |
| extension (EVL) for LLVM IR. EVL provides primitives that are practical for |
| both, backends and IR-level automatic vectorizers. At the same time, EVL is |
| compatible with LLVM-SVE and even existing short SIMD ISAs stand to benefit |
| from its consistent handling of predication.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="BoF_2"> |
| <b>IPO --- Where are we, where do we want to go?</b><br> |
| <i>Johannes Doerfert (Argonne National Laboratory), Kit Barton (IBM Toronto Lab)</i> |
| <p>Interprocedural optimizations (IPOs) have been historically weak in LLVM. |
| The strong reliance on inlining can be seen as a consequence or cause. Since |
| inlining is not always possible (parallel programs) or beneficial (large |
| functions), the effort to improve IPO has recently seen an upswing again |
| [0,1,2]. In order to capitalize this momentum, we would like to talk about the |
| current situation in LLVM, and goals for the immediate, but also distant, |
| future.</p> |
| <p>This open-ended discussion is not aimed at a particular group of people. We |
| expect to discuss potential problems with IPO, as well as desirable analyses |
| and optimizations, both experts and newcomers are welcome to attend.</p> |
| <p>[0] <a |
| href="https://lists.llvm.org/pipermail/llvm-dev/2018-August/125537.html"> |
| https://lists.llvm.org/pipermail/llvm-dev/2018-August/125537.html</a></p> |
| <p>[1] These links do not yet exist but will be added later on.</p> |
| <p>[2] One link will be an RFC outlining missing IPO capabilities, the other |
| will point to a function attribute deduction rewrite patch (almost |
| finished).</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="BoF_3"> |
| <b>LLVM binutils</b> |
| [ <a href="http://lists.llvm.org/pipermail/llvm-dev/2019-April/132032.html">Notes</a> ] |
| [ <a href="slides/BoF-HendersonRupprecht-LLVM_binutils.pdf">Slides</a> ]<br> |
| <i>James Henderson (SN Systems), Jordan Rupprecht (Google)</i> |
| <p>LLVM has a suite of binary utilities that broadly mirror the GNU binutils |
| suite, with tools such as llvm-readelf, llvm-nm, and llvm-objcopy. These tools |
| are already widely used in testing the rest of LLVM, and are now starting to be |
| adopted as full replacements for the GNU tools in production environments.</p> |
| <p>This discussion will focus on what more needs to be done to make this |
| migration process easier, how far we need to go to make drop-in replacements |
| for the GNU tools, and what features people want to prioritize. Finally, we |
| will look at the broader future goals of these tools.</p> |
| |
| <tr><td valign="top" id="BoF_4"> |
| <b>RFC: Reference OpenCL Runtime library for LLVM</b><br> |
| <i>Andrew Savonichev (Intel), Alexey Sachkov (Intel)</i> |
| <p>LLVM is used as a foundation for majority of OpenCL compilers, thanks to |
| excellent support of OpenCL C language in Clang frontend, and modularity of |
| LLVM. Unfortunately, a compiler is not the only component that is required to |
| develop using OpenCL: users need a runtime library that implements the OpenCL |
| API. While there are several implementations of OpenCL runtime exist, both open |
| and proprietary, they do not have a community wide adoption. This leads to |
| fragmentation and effort duplication across OpenCL community, and negatively |
| impacts OpenCL ecosystem in general.</p> |
| <p>The purpose of this BoF is to bring all parties interested in getting a |
| reference OpenCL Runtime implementation in LLVM, that is designed to be easily |
| extendable to support various accelerator devices (CPU/GPU/FPGA/DSP) and allow |
| users and compiler developers to rapidly prototype OpenCL specific |
| functionality in LLVM and Clang.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="BoF_5"> |
| <b>LLVM Interface Stability Guarantees BoF</b><br> |
| <i>Stephen Kelly</i> |
| <p>The goal of this BoF is to create the basis for a new page of documentation |
| enumerating the stability guarantees of interfaces exposed from LLVM |
| products.</p> |
| <p>There are some interfaces which are known to make no stability guarantees, |
| such as the Clang C++ API, others which make strict API guarantees, such as the |
| libclang C API, and still others, such as the LLVM IR API which is somewhere in |
| between. Only the latter appears in the LLVM Developer Policy. Mostly the rest |
| of the interface stability guarantees are tribal knowledge.</p> |
| <p>A centralized location in the documentation for this documentation would |
| present guidelines for developers to follow when changing various parts of LLVM |
| code, and inform consumers what they can expect and rely upon when using |
| interfaces. This includes code interfaces and command line interfaces.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="BoF_6"> |
| <b>Clang Static Analyzer BoF</b><br> |
| <i>Devin Coughlin (Apple), Gabor Horvath (Eotvos Lorand University)</i> |
| <p>Let's discuss the present and future of the Clang Static Analyzer! |
| We'll start with a brief overview of analyzer features the community has |
| added over the last year. We'll then dive into a discussion of possible |
| focus areas for the next year, including potential deeper integration with |
| clang-tidy.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="BoF_7"> |
| <b>LLVM Numerics Improvements</b><br> |
| <i>Michael Berg (Apple), Steve Canon (Apple)</i> |
| <p>Some LLVM based compilers currently provide two modes of floating point code |
| generation. The first mode, called fast-math, is where performance is the |
| primary consideration over numerical precision and accuracy. This mode does not |
| strictly follow the IEEE-754 standard, but has proven useful for applications |
| that do not require this level of precision. The second mode, called |
| precise-math, is where the compiler carefully follows the subset of behavior |
| defined in the IEEE standard that is applicable to conforming hardware targets. |
| This mode is primarily used for compute workloads and wherever fast-math |
| precision is inadequate, however it runs much slower as it requires a larger |
| number of instructions in general. In practice neither of these modes is |
| particularly desirable. The fast-math mode ignores a significant portion of the |
| standard as pertains to handling undefined values described as Not a Number |
| (NaNs) and Infinities (INFs), resulting in difficulties for certain workloads |
| when the hardware target computes these values correctly and performance |
| remains critical.</p> |
| <p>Until recently these two models were mutually exclusive, however with the |
| addition of IR flags they need not be. For instance, the FastMath metadata |
| module flag drives behavior deemed numerically unsafe when it is enabled, by |
| indiscriminately enabling optimizations. With IR flags this behavior can be |
| enabled with much finer granularity, allowing various code forms to be fast or |
| precise together in one module. We call this mixed mode compilation. IR flags |
| can be used individually or paired to produce desired floating point behavior |
| under specified constraints with fine granularity of control. Optimization |
| passes have been modified under this new kind of control to produce this |
| behavior. This talk will describe the recent numerics work and discuss the |
| implications for front-ends and backends built with LLVM.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="BoF_8"> |
| <b>LLVM Foundation BoF</b><br> |
| <i>LLVM Foundation Board of Directors</i> |
| <p>Ask the LLVM Foundation Board of Directors anything, get program updates.</p> |
| </td></tr> |
| </table> |
| |
| <div class="www_sectiontitle" id="Poster">Posters</div> |
| <table cellpadding="10"> |
| |
| <tr><td valign="top" id="Poster_1"> |
| <b>Clava: C/C++ source-to-source from CMake using LARA</b> |
| [ <a href="slides/Poster-Bispo-Clava_CC++_source_to_source_from_CMake_using_LARA.pdf">Poster</a> ]<br> |
| <i>João Bispo (FEUP/INESCTEC)</i> |
| <p>Clava is a Clang-based source-to-source compiler that executes scripts |
| written in LARA, a superset of JavaScript with additional syntax for AST |
| analysis and transformation.</p> |
| <p>Clava intends to improve on Clang's source-to-source capabilities, by |
| providing a more convenient and powerful way to analyze, transform and generate |
| C/C++ code.</p> |
| <p>Although Clava is a stand-alone tool, we will present the Clava CMake |
| plug-in, which allows to easily apply LARA scripts to C/C++ CMake projects. |
| Clava is open-source and runs on Linux, Windows and MacOS.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_2"> |
| <b>Safely Optimizing Casts between Pointers and Integers</b> |
| [ <a href="slides/Poster-Lee-Safely_optimizing_casts_between_pointers_and_integers.pdf">Poster</a> ]<br> |
| <i>Juneyoung Lee (Seoul National University, Korea), Chung-Kil Hur (Seoul |
| National University, Korea), Ralf Jung (MPI-SWS, Germany), Zhengyang Liu |
| (University of Utah, USA), John Regehr (University of Utah, USA), Nuno P. |
| Lopes (Microsoft Research, UK)</i> |
| <p>In this talk, a list of optimizations that soundly removes casts between |
| pointers and integers will be presented. In LLVM, a pointer is more than just |
| an integer: LLVM allows a pointer to track its underlying object, and the rule |
| to find it is defined as based-on relation. This allows LLVM to aggressively |
| optimize load/stores, but makes the meaning of pointer-integer casts |
| complicated. This causes conflict between existing optimizations, causing |
| long-standing miscompilation bugs like 34548.</p> |
| <p>To fix it, we suggest disabling folding of inttoptr(ptrtoint(p)) to p and |
| using a safe workaround to remove them. This optimization is important because |
| it's removing a significant portion of such cast pairs. We'll show that |
| even if the optimization is disabled, majority of casts can be removed by |
| carefully adding new \& modifying existing optimizations. After the |
| updates, the performance is still comparable to the original LLVM.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_3"> |
| <b>Scalar Evolution Canon: Click! Canonicalize SCEV and validate it by Z3 SMT solver!</b> |
| [ Poster ]<br> |
| <i>Lin-Ya Yu (Xilinx), Alexandre Isoard (Xilinx)</i> |
| <p>A scalar evolution(SCEV) is an analyzed expression. It represents how the |
| value of scalar variables changes in a program when we execute the code[0]. It |
| is implemented as a pass and is well-used in many analysis and optimizations in |
| LLVM, such as loop strength reduction, induction variable substitution, and |
| memory access analysis. However, it is difficult to have a canonical form for |
| SCEV that can meet all other passes needs. Here, we develop SCEV Canon to do |
| canonicalization and further simplification on SCEV.</p> |
| <p>A satisfiability modulo theories(SMT) solver from Microsoft Research, Z3, is |
| introduced in this work to verify the correctness of canonicalized SCEV. |
| Moreover, Z3 can also help us check the equivalence of SCEVs between different |
| SCEV implementation in different released of LLVM. This poster shares the whole |
| process of how to canonicalize SCEV without modifying the scalar evolution |
| pass, verify and test the generated SCEV. We also try to open a discussion |
| about some simplification that can be done on SCEV.</p> |
| <p>[0] <a |
| href="https://subscription.packtpub.com/book/application_development/9781785280801/5/ch05lvl1sec36/scalar-evolution"> |
| https://subscription.packtpub.com/book/application_development/9781785280801/5/ch05lvl1sec36/scalar-evolution</a></p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_4"> |
| <b>Splendid GVN: Partial Redundancy Elimination for Algebraic Simplification</b> |
| [ <a href="slides/Poster-Her-SplendidGVN_Partial_redundancy_elimination.pdf">Poster</a> ]<br> |
| <i>Li-An Her (National Tsing Hua University), Jenq-Kuen Lee (National Tsing Hua University)</i> |
| <p>Modern computation of Neural Network, signal processing of GPS and Wifi, |
| image processing, etc, highly depends on enormous linear algebra operations. |
| Algebraic simplification improves performance for more and more complicated |
| computation such as convolutions for CNN and Sobel operator, inner products for |
| discrete cosine transform and FFT of signal processing, etc. LLVM IR provides |
| several passes of optimization for algebraic simplification, constant folding, |
| copy propagation, etc. One is global value numbering (GVN). These passes work |
| fine except encountering branches and non-local cases. One case is partial |
| redundancy elimination (PRE). At least two instructions are redundant or |
| congruent, but they are in different blocks. Even though elimination of one |
| redundant won't lead to logic error, compiler lacks such rule and ignores |
| such elimination. Thus, algebraic simplification fails to optimize code when |
| PRE occurs. GVN provides PRE mechanism with lazy code motion, but it cannot |
| provide more accurate congruence information due to loops and Φ-nodes. New |
| GVN handles such case and provides more delicate congruence information, but it |
| lacks mechanism for and ignores PRE.</p> |
| <p>In this paper, we propose Splendid GVN which inserts PRE mechanism for New |
| GVN on LLVM 7.0.0. When PRE happens, our pass checks safety and applies hoist |
| code motion to eliminate partial redundancy. Original GVN applies less accurate |
| algorithm and can only perform lazy code motion, which takes risk for |
| increasing code size. Original Hoist GVN cannot handle PRE and utilizes GVN |
| instead of New GVN, which cannot provide more delicate information due to loops |
| and may miss opportunity for further elimination. Experiments show that our |
| Splendid GVN performs hoist code motion for PRE on 2 qualified PRE programs |
| from LLVM test directory for GVN (available in source code). Splendid GVN |
| reduces total code size with -18.37% and -7% compared to original 2 programs |
| and New GVN results.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_5"> |
| <b>An alternative OpenMP Backend for Polly</b> |
| [ <a href="slides/Poster-Halkenhauser-An_alternative_OpenMP_backend_for_Polly.pdf">Poster</a> ]<br> |
| <i>Michael Halkenhäuser (TU Darmstadt)</i> |
| <p>LLVM's polyhedral infrastructure framework Polly may automatically |
| exploit thread-level parallelism through OpenMP. Currently, the user can only |
| influence the number of utilized threads, while other OpenMP parameters such as |
| the scheduling type and chunk size are set to fixed values. This in turn, |
| limits a user's ability to adapt the optimization process for a given |
| problem.</p> |
| <p>In this work, we present an alternative OpenMP backend for Polly, which |
| provides additional customization options to the user and is based on the LLVM |
| OpenMP runtime. We evaluate our new backend and the influence of the new |
| customization options on performance and compare to Polly's existing OpenMP |
| backend.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_6"> |
| <b>Does the win32 clang compiler executable really need to be over 21MB in size?</b> |
| [ <a href="slides/Poster-Gallop-Does_the_win32_clang_compiler_reaaly_nee_to_be_over_21MB_in_size.png">Poster</a> ]<br> |
| <i>Russell Gallop (SN System), Greg Bedwell (SN Systems)</i> |
| <p>The title of this lighting talk is from a bug filed in the early days of the |
| PS4 compiler. It noted that the LLVM-based PS4 compiler was more than 3 times |
| larger than the PS3 compiler. Since then it has almost doubled to over 40MB. |
| For a compiler which targets one system this seems excessive. Executable size |
| can cost in worse cache performance and cost time if transferring for |
| distributed builds.</p> |
| <p>In this lightning talk I will look at where this comes from and how it can |
| be managed.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_7"> |
| <b>Enabling Multi- and Cross-Language Verification with LLVM</b> |
| [ <a href="slides/Poster-Garzella-Enabling_multi_and_cross_language_verification_with_LLVM.pdf">Poster</a> ]<br> |
| <i>Jack J. Garzella (University of Utah), Marek Baranowski (University of Utah), |
| Shaobo He (University of Utah), Zvonimir Rakamaric (University of Utah)</i> |
| <p>Developers nowadays regularly use numerous programming languages with |
| different characteristics and trade-offs. Unfortunately, implementing a |
| software verifier for a new language from scratch is a large and tedious |
| undertake, requiring expert knowledge in multiple domains, such as compilers, |
| verification, and constraint solving. Hence, only a tiny fraction of the used |
| languages has readily available software verifiers to aid in the development of |
| correct programs. In the past decade, there has been a trend of leveraging |
| popular compiler intermediate representations (IRs), such as LLVM IR, when |
| implementing software verifiers. The main advantage is to avoid implementing |
| large front-ends, and instead rely on a typically simple canonical format of an |
| IR. In addition, processing IR promises out-of-the-box multi- and |
| cross-language verification since, at least in theory, a verifier ought to be |
| able to handle a program in any programming language (and their combination) |
| that can be compiled into the IR. In practice though, to the best of our |
| knowledge, nobody has explored the feasibility and ease of such integration of |
| new languages. This talk introduces a methodology for adding support for a new |
| language into an IR-based verification toolflow. Using our methodology, we |
| extend an existing verifier called SMACK with support for 7 additional |
| languages. We assess the quality of our extensions and the proposed methodology |
| through several case studies, and we describe the lessons we learned in the |
| process.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_8"> |
| <b>Instruction Tracing and dynamic codegen analysis to identify unique llvm performance issues.</b> |
| [ <a href="slides/Poster-Biplop-Tracing.pdf">Poster</a> ]<br> |
| <i>Biplob (IBM)</i> |
| <p>Performance analysis of the machine code generated by a compiler can be |
| carried out in different ways and can also be based on application in question. |
| Common methods use some form of profiling on a running program which generally |
| provides the statistical information about certain data and events. While this |
| method does give important insights to a performance problem, some of the |
| issues are more clearly understood when the compiled applications is actually |
| run and the dynamic instructions of hot code execution paths are traced and |
| analyzed in a small execution window. Trace records contain instructions and |
| data, memory addresses and other information which provide complete visibility |
| into the workings of an application.</p> |
| <p>While tracing is very useful in micro-architecture analysis we will stick to |
| how these traces can benefit compiler performance analysis. In this talk we |
| will look at some of these code-gen issues which were better identified when a |
| running application compiled by llvm and other compilers were traced for hot |
| code sections on IBM Power9 processor.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_9"> |
| <b>Handling all Facebook requests with JITed C++ code</b> |
| [ <a href="slides/Poster-Zhou-Handling_all_Facebook_requests_with_JITed_C++_code.pdf">Poster</a> ]<br> |
| <i>Huapeng Zhou (Facebook), Yuhan Guo (Facebook)</i> |
| <p>Facebook needs an efficient scripting framework to enable fast iteration of |
| HTTP request handling logic in our L7 reverse proxy. A C++ scripting engine and |
| code deployment ecosystem was created to compile/link/execute C++ script at |
| run-time, using Clang and LLVM ORC APIs. The framework allows developers to |
| write business logic and unit test in C++ script, as well as debug using GDB. |
| Profiling using perf is also supported for PGO purpose. This new framework |
| outperformed another previously used scripting language by up to 4X, measured |
| in execution time.</p> |
| <p>In order to power the C++ script in ABI compatible way, a PCH (pre-compiled |
| header) is built statically to provide declarations and definitions of |
| necessary dependent types and methods. Clang APIs are then used at run-time to |
| transform source code to LLVM IR, which are later passed through LLVM ORC |
| layers for linking/optimizing. Above Clang/LLVM toolchains are statically |
| linked into main binary to ensure compatibility between PCH and C++ scripts. As |
| a result, scripts could be deployed in real time without any main binary |
| change.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_10"> |
| <b>Implementing SPMD control flow in LLVM using reconverging CFGs</b> |
| [ <a href="slides/Poster-Wahlster-Implementing_SPMD_control_flow_in_LLVM_using_reconverging_CFG.pdf">Poster</a> ]<br> |
| <i>Fabian Wahlster (Technische Universität München), Nicolai Hähnle (Advanced Micro Devices)</i> |
| <p>Compiling programs for an SPMD execution model, e.g. for GPUs or for whole |
| program vectorization on CPUs, requires a transform from the thread-level input |
| program into a vectorized wave-level program in which the values of the |
| original threads are stored in corresponding lanes of vectors. The main |
| challenge of this transform is handling divergent control flow, where threads |
| take different paths through the original CFG. A common approach, which is |
| currently taken by the AMDGPU backend in LLVM, is to first structurize the |
| program as a simplification for subsequent steps.</p> |
| <p>However, structurization is overly conservative. It can be avoided when |
| control flow is uniform, i.e. not divergent. Even where control flow is |
| divergent, structurization is often unnecessary. Moreover, LLVM's |
| StructurizeCFG pass relies on region analysis, which limits the extent to which |
| it can be evolved.</p> |
| <p>We propose a new approach to SPMD vectorization based on saying that a CFG |
| is reconverging if for every divergent branch, one of the successors is a |
| post-dominator. This property is weaker than structuredness, and we show that |
| it can be achieved while preserving uniform branches and inserting fewer new |
| basic blocks than structurization requires. It is also sufficient for code |
| generation, because it guarantees that threads which "leave" a wave |
| at divergent branches will be able to rejoin it later.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_11"> |
| <b>LLVM for the Apollo Guidance Computer</b> |
| [ Poster ]<br> |
| <i>Lewis Revill (University of Bath)</i> |
| <p>Nearly 50 years ago on the 20th of July 1969 humans set foot on the moon for |
| the first time. Among the many extraordinary engineering feats that made this |
| possible was the Apollo Guidance Computer, an innovative processor for its time |
| with an instruction set that was thought up well before the advent of C. So 50 |
| years later, why not implement support for it in a modern compiler such as |
| LLVM?</p> |
| <p>This talk will give a brief overview of some of the architectural features |
| of the Apollo Guidance Computer followed by an account of my implementation of |
| an LLVM target so far. The shortcomings of LLVM when it comes to implementing |
| such an unusual architecture will be discussed along with the workarounds used |
| to overcome them.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_12"> |
| <b>LLVM Miner: Text Analytics based Static Knowledge Extractor</b> |
| [ <a href="slides/Poster-Hameeza-LLVM_Miner_Text_Analytics_based_Static_Knowledge_Extractor.pdf">Poster</a> ]<br> |
| <i>Hameeza Ahmed (NED University of Engineering and Technology), Muhammad Ali |
| Ismail (NED University of Engineering and Technology)</i> |
| <p>Compiler converts high level language code into assembly language by |
| enabling optimizations. There are three phases in compiler namely front end, |
| middle end and backend. Low Level Virtual Machine (LLVM) is an open source |
| framework enabling provision of all these three stages. One of the reasons of |
| huge adoption of LLVM is its powerful optimizer or middle end stage. There |
| exist various opportunities to optimize given Intermediate Representation (IR) |
| code generated by front end. Before applying any optimization significant |
| efforts are dedicated for detailed analysis of given IR in order to extract |
| static information hidden in source code.</p> |
| <p>Up till now, there exists a standard mechanism to analyze IR code by using |
| analysis passes written in LLVM itself. Each time some information is required |
| from IR, a pass is written or reused in LLVM core syntax. This approach is |
| proved to be complex for novice programmers who are unfamiliar with the LLVM |
| coding style having hard core C++ concepts. This way a significant amount of |
| time is spent on learning LLVM programming than doing the required compile time |
| code analysis. In this regard, an easier mechanism is needed to perform static |
| code analysis in LLVM.</p> |
| <p>In this work, LLVM miner is presented to simplify static IR level analysis |
| in LLVM compiler tool. LLVM miner performs text analytics in order to extract |
| related information from given IR code. The IR generated from front end is |
| passed through the proposed miner where static hidden features are extracted |
| easily. The proposed approach has been tested using set of 5 mixed benchmark |
| codes namely bfs, connected components, grep, histogram, and kmeans. The |
| experiments are conducted using R script for determining the instruction |
| frequency and application trend. Instruction frequency shows count of each |
| instruction in given IR code. It is represented by means of bar graph and word |
| cloud. Then application trend is obtained by clustering individual instructions |
| in certain categories such as branch, compute, function calls, IO read write, |
| memory consumption, and memory read write operations of each instruction. |
| Application trend shows proportion of different classes of operations in a |
| given code using bar graphs. It enables us to know whether application is |
| compute bound, or memory bound or I/O bound etc by using static code level |
| features. The analysis of LLVM IR using text mining techniques appears to be a |
| promising direction towards studying significant features hidden in source |
| code. The text analytics of given IR is expected to be an easier and less |
| costly solution both in terms of time and efforts, as compared to the |
| conventional LLVM analysis passes.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_13"> |
| <b>Function Merging by Sequence Alignment</b> |
| [ <a href="slides/Poster-Rocha-Function_merging_by_sequence_alignment.pdf">Poster</a> ]<br> |
| <i>Rodrigo Rocha (University of Edinburgh), Pavlos Petoumenos (University of Edinburgh), Zheng Wang (Lancaster University), Murray Cole (University of Edinburgh), Hugh Leather (University of Edinburgh)</i> |
| <p>Resource-constrained devices for embedded systems are becoming increasingly |
| important. In such systems, memory is highly restrictive, making code size in |
| most cases even more important than performance. Compared to more traditional |
| platforms, memory is a larger part of the cost and code occupies much of it. |
| Despite that, compilers make little effort to reduce code size. One key |
| technique attempts to merge the bodies of similar functions. However, |
| production compilers only apply this optimization to identical functions, while |
| research compilers improve on that by merging the few functions with identical |
| control-flow graphs and signatures. Overall, existing solutions are |
| insufficient and we end up having to either increase cost by adding more memory |
| or remove functionality from programs.</p> |
| <p>We introduce a novel technique that can merge arbitrary functions through |
| sequence alignment, a bioinformatics algo- rithm for identifying regions of |
| similarity between sequences. We combine this technique with an intelligent |
| exploration mechanism to direct the search towards the most promising function |
| pairs. Our approach is more than 2.4x better than the state-of-the-art, |
| reducing code size by up to 25%, with an overall average of 6%, while |
| introducing an average compilation-time overhead of only 15%. When aided by |
| profiling information, this optimization can be deployed without any |
| significant impact on the performance of the generated code.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_14"> |
| <b>Compilation and optimization with security annotations</b> |
| [ <a href="slides/Poster-TuanVu-Compilation_and_optimization_with_security_annotations.pdf">Poster</a> ]<br> |
| <i>Son Tuan Vu (LIP6), Karine Heydemann (LIP6), Arnaud de Grandmaison (ARM), |
| Albert Cohen (Google)</i> |
| <p>Program analysis and program transformation systems need to express |
| additional program properties, to specify test and verification goals, and to |
| enhance their effectiveness. Such annotations are typically inserted to the |
| representation on which the tool operates; e.g., source level for establishing |
| compliance with a specification, and binary level for the validation of secure |
| code. While several annotation languages have been proposed, these typically |
| target the expression of functional properties. For the purpose of implementing |
| secure code, there has been little effort to support non-functional properties |
| about side-channels or faults. Furthermore, analyses and transformations making |
| use of such annotations may target different representations encountered along |
| the compilation flow.</p> |
| <p>We extend an annotation language to express a wider range of functional and |
| non-functional properties, enabling security-oriented analyses and influencing |
| the application of code transformations along the compilation flow. We |
| translate this language to the different compiler representations from abstract |
| syntax down to binary code. We explore these concepts through the design and |
| implementation of an optimizing, annotation-aware compiler, capturing |
| annotations from the program source, propagating and emitting them in the |
| binary, so that binary-level analysis tools can use them.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_16"> |
| <b>Leveraging Polyhedral Compilation in Chapel Compiler</b> |
| [ <a href="slides/Poster-Yerawar-ChapelPolly.pdf">Poster</a> ]<br> |
| <i>Sahil Yerawar (IIT Hyderabad), Siddharth Bhat (IIIT Hyderabad), Michael Ferguson (Cray Inc.), Philip Pfaffe (Karlsruhe Institute of Technology), Ramakrishna Upadrasta (IIT Hyderabad)</i> |
| <p>Chapel is an emerging parallel programming language developed with the aim |
| of providing better performance in High-Performance Computing as well as |
| accessibility to the newcomer programmers. It relies on LLVM as one of its |
| backends. This talk shows how the polyhedral compilation techniques available |
| in Polly are utilized by the Chapel Compiler. We will share our experience of |
| using Polly's Loop Optimizer in a new setting with Polly & LLVM |
| Developers.In particular, the talk will discuss how the Chapel compiler can |
| benefit from the optimizations available in Polly including GPGPU code |
| generation.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_17"> |
| <b>LLVM on AVR - textual IR as a powerful tool for making "impossible" compilers</b> |
| [ Poster ]<br> |
| <i>Carl Peto (Swift for Arduino/Petosoft)</i> |
| <p>To be demonstrated on stage and available to use and test, I have built a |
| prototype compiler for the a subset of the Swift language onto the Arduino UNO |
| platform, which is a radically different use for the language. Despite the |
| Swift compiler and front end having limited support for such a different back |
| end.</p> |
| <p>Key to the success was separation of the first part of the compilation into |
| textual LLVM IR (using a standard toolchain), followed by compilation from LLVM |
| IR files into machine code using a custom built llc. This approach improves |
| debugging, especially of deployed product, and separation of concerns. |
| Ultimately it could be used as a template for other "impossible" |
| compilers such as Swift to WebAssembly, Go to OpenGL shaders and more.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_18"> |
| <b>Vectorizing Add/Sub Expressions with SLP</b> |
| [ <a href="slides/Poster-Porpodas-Supernode_SLP.pdf">Poster</a> ]<br> |
| <i>Vasileios Porpodas (Intel Corporation, USA), Rodrigo C. O. Rocha (University |
| of Edinburgh, UK), Evgueni Brevnov (Intel Corporation, USA), Luís F. |
| W. Góes (PUC Minas, Brazil), Timothy Mattson (Intel Corporation, |
| USA)</i> |
| <p>The SLP Vectorizer is LLVM's second vectorizer (after the Loop |
| Vectorizer). It performs auto-vectorization of straight-line code. It works by |
| first exploring the scalar code for vectorizable patterns (groups), and then by |
| replacing each group with its vectorized form. </p> |
| <p>This talk presents the existing design of the SLP vectorizer and shows how |
| it fails to vectorize simple IR inputs with Add/Sub (or Mul/Div) expression |
| trees. We propose specific improvements to the current design that will let us |
| effectively handle such code. We named this design SuperNode SLP (SN-SLP) |
| because it extends the SLP graph to include new "fat" nodes that |
| include multiple instructions. This talk also presents our detailed plan for |
| upstreaming the bulk of this work in a sequence of patches.</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_19"> |
| <b>Adding support for C++ contracts to Clang</b> |
| [ <a href="slides/Poster-Lopez-Gomez-Adding_support_for_C++_contracts_to_Clang.pdf">Poster</a> ]<br> |
| <i>Javier López-Gómez (University Carlos III of Madrid), J. |
| Daniel García (University Carlos III of Madrid)</i> |
| <p>A language supporting contract-checking allows to detect programming errors. |
| Also, making this information available to the compiler may cause it to perform |
| additional optimizations.</p> |
| <p>This paper presents our implementation of the P0542R5 technical |
| specification (now part of the C++20 working draft).</p> |
| </td></tr> |
| |
| <tr><td valign="top" id="Poster_20"> |
| <b>Optimizing Nondeterminacy: Exploiting Race Conditions in Parallel Programs</b> |
| [ Poster ]<br> |
| <i>William S. Moses (MIT CSAIL)</i> |
| <p>As computation moves towards parallel programming models, writing efficient |
| parallel programs becomes paramount. As a result, there have been several |
| efforts (Tapir, HPVM, among others) to augment serial compilers such as LLVM to |
| have a first-class representation of parallelism. Such representations |
| theoretically permit the compiler to both analyze and optimize parallel |
| programs.</p> |
| <p>A major difference between serial and parallel programs is that in many |
| parallel runtimes, one cannot make any assumptions about the ordering of |
| various logical tasks. This nondeterminism creates an opportunity for the |
| compiler. Since any ordering is valid, the compiler can also reorder tasks if |
| it believes it beneficial.</p> |
| <p>This talk will discuss how the compiler can take advantage of this |
| nondeterminacy through a number of example optimizations, taking a look at |
| their theoretical implications as well as how they perform when implemented |
| atop the Tapir extension to LLVM.</p> |
| </td></tr> |
| </table> |
| |
| <!-- *********************************************************************** --> |
| |
| <!--#include virtual="sponsors.incl" --> |
| |
| <hr> |
| |
| <!--#include virtual="../../footer.incl" --> |