devmtg/2019-04/talks.html - llvm-www - Git at Google

 <!--#include virtual="../../header.incl" -->

 <div class="www_sectiontitle" id="top">2019 European LLVM Developers Meeting</div>
 <div style="float:left; width:68%;">
 <div style="width:100%;">
 <ul>
   <li><a href="index.html">Conference main page</a></li>
   <li><b>Conference Dates</b>: April 8-9, 2019</li>
   <li><b>Location</b>: <a href="https://www.leplaza-brussels.be/en/">
        Le Plaza Brussels, 118 - 126 boulevard Adolphe Max, 1000 Brussels, Belgium</a></li>
 </ul>
 </div>

 <div class="www_sectiontitle" id="about">About</div>
 <p>The meeting serves as a forum for LLVM, Clang, LLDB and other LLVM project
 developers and users to get acquainted, learn how LLVM is used, and exchange
 ideas about LLVM and its (potential) applications. <p>

 <p>The conference includes:
 <ul>
 <li><a href="#Keynote">Keynote</a></li>
 <li><a href="#Tutorial">Tutorials</a></li>
 <li><a href="#Talk">Technical talks</a></li>
 <li><a href="#LightningTalk">Lightning talks</a></li>
 <li><a href="#BoF">BoFs</a></li>
 <li><a href="#Poster">Poster session</a></li>
 <li>and a reception. </li>
 </ul>
 </p>

 <!-- *********************************************************************** -->

 <div class="www_sectiontitle" id="Keynote">Keynote</div>
 <table cellpadding="10">

 <tr><td valign="top" id="Keynote_1">
 <b>MLIR: Multi-Level Intermediate Representation for Compiler Infrastructure</b>
  [ <a href="https://www.youtube.com/watch?v=qzljG6DKgic">Video</a> ]
  [ <a href="slides/Keynote-ShpeismanLattner-MLIR.pdf">Slides</a> ]<br>
 <i>Tatiana Shpeisman (Google), Chris Lattner (Google)</i>
 <p>This talk will give an overview of Multi-Level Intermediate Representation -
 a new intermediate representation designed to provide a unified, flexible and
 extensible intermediate representation that is language-agnostic and can be
 used as a base compiler infrastructure.  MLIR shares similarities with
 traditional CFG-based three-address SSA representations (including LLVM IR or
 SIL), but it also introduces notions from the polyhedral domain as first class
 concepts. The notion of dialects is a core concept of MLIR extensibility,
 allowing multiple levels in a single representation. MLIR supports the
 continuous lowering from dataflow graphs to high-performance target specific
 code through partial specialization between dialects. We will illustrate in
 this talk how MLIR can be used to build an optimizing compiler infrastructure
 for deep learning applications.</p>
 <p>MLIR supports multiple front- and back-ends and uses LLVM IR as one of its
 primary code generation targets. MLIR also relies heavily on design principles
 and practices developed by the LLVM community. For example, it depends on LLVM
 APIs and programming idioms to minimize IR size and maximize optimization
 efficiency. MLIR uses LLVM testing utilities such as FileCheck to ensure robust
 functionality at every level of the compilation stack, TableGen to express IR
 invariants, and it leverages LLVM infrastructure such as dominance analysis to
 avoid implementing all the necessary compiler functionalities from scratch. At
 the same time, it is a brand new IR, both more restrictive and more general
 than LLVM IR in different aspects of its design. We believe that the LLVM
 community will find in MLIR a useful tool for developing new compilers,
 especially in machine learning and other high-performance domains.</p>
 </td></tr>
 </table>

 <div class="www_sectiontitle" id="Talk">Technical talks</div>
 <table cellpadding="10">

 <tr><td valign="top" id="Talk_1">
 <b>Switching a Linux distribution's main toolchains to LLVM/Clang</b>
  [ Video ]
  [ <a href="slides/TechTalk-Rosenkranzer-Switching_a_Linux_distro.pdf">Slides</a> ]<br>
 <i>Bernhard Rosenkr&auml;nzer (Linaro, OpenMandriva, LinDev)</i>
 <p>OpenMandriva is the first general-purpose Linux distribution that has
 switched its primary toolchain to Clang -- this talk will give an overview of
 what we did, what problems we&#39;ve faced, and where we&#39;re still having
 problems (usually worked around by using gcc for some packages).</p>
 </td></tr>

 <tr><td valign="top" id="Talk_2">
 <b>Just compile it: High-level programming on the GPU with Julia</b>
  [ Video ]
  [ <a href="slides/TechTalk-Besard-Just_compile_it_high_level_programming_on_the_GPU_with_Julia.pdf">Slides</a> ]<br>
 <i>Tim Besard (Ghent University)</i>
 <p>High-level programming languages often rely on interpretation or compilation
 schemes that are ill-suited for hardware accelerators like GPUs: These devices
 typically require statically compiled, straight-line code in order to reach
 acceptable performance. The high-level Julia programming language takes a
 different approach, by combining careful language design with an LLVM-based JIT
 compiler to generate high-quality machine code.</p>
 <p>In this talk, I will show how we&#39;ve used that capability to build a GPU
 back-end for the Julia language, and explain the underlying techniques that
 make it happen, including a high-level Julia wrapper for the LLVM libraries,
 and interfaces to share functionality with the existing Julia code generator. I
 will also demonstrate some of the powerful abstractions that we have built on
 top of this infrastructure.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_3">
 <b>The Future of AST Matcher-based Refactoring</b>
  [ Video ]
  [ <a href="slides/TechTalk-Kelly-The_future_of_AST_matcherbased_refactoring.pdf">Slides</a> ]<br>
 <i>Stephen Kelly</i>
 <p>In the last few years, Clang has opened up new possibilities in C++ tooling
 for the masses. Tools such as clang-tidy and clazy offer ready-to-use
 source-to-source transformations. Available transformations can be used to
 modernize (use newer C++ language features), improve readability (remove
 redundant constructs), or improve adherence to the C++ Core Guidelines.</p>
 <p>However, when special needs arise, maintainers of large codebases need to
 learn some of the Clang APIs to create their own porting aids. The Clang APIs
 necessarily form a more-exact picture of the structure of C++ code than most
 developers keep in their heads, and bridging the conceptual gap can be a
 daunting task.</p>
 <p>This talk will show tools and features which make this task easier for
 developers, ranging from <ul>
 <li>Improvements to the clang-query interpreter</li>
 <li>Improvements to the AST Matcher API</li>
 <li>Information essential to creating clang-tidy-checks</li>
 <li>Debugging and profiling of AST Matchers</li>
 <li>Advanced tooling </li>
 </ul>
 </p>
 <p>These features are in various stages along the way to being upstreamed to
 Clang. They enable new possibilities for large-scale refactoring in a
 reasonable timeframe by solving problems of API discovery, guiding users in
 creating working refactorings.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_4">
 <b>A compiler approach to Cyber-Security</b>
  [ Video ]
  [ <a href="slides/TechTalk-Ferriere-A_compiler_approach_to_cybersecurity.pdf">Slides</a> ]<br>
 <i>Fran&ccedil;ois de Ferri&egrave;re (STMicroelectronics)</i>
 <p>STMicroelectronics is developing LLVM-based compilation tools for its
 proprietary processors and also for the ARM cores. Applications, among which an
 increasing number of IOTs developments, require more and more security
 implemented either in hardware or software, or both. To implement complex and
 reliable software countermeasures that can be deployed in a timely manner, we
 are adding specific cybersecurity code-generation features in our production
 LLVM compiler, that we present in this talk.</p>
 <p>We give implementation details on how we worked into Clang and LLVM to
 implement these techniques and we explain how they contribute to reinforce the
 software protection. We also detail how we can restrict these transformations
 to specific safety-critical regions of a program to meet the industrial
 constraints on performance and code size of our applications.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_5">
 <b>Compiler Optimizations for (OpenMP) Target Offloading to GPUs</b>
  [ Video ]
  [ <a href="slides/TechTalk-Doerfert-Compiler_optimization_for_OpenMP_accelerator_offloading.pdf">Slides</a> ]<br>
 <i>Johannes Doerfert (Argonne National Laboratory), Hal Finkel (Argonne National Laboratory)</i>
 <p>The support of OpenMP target offloading in Clang is steadily increasing.
 However, when it comes to the optimization of such codes, LLVM is still doing a
 horrible job. Early separation into different modules and state machine
 generation are only two reasons why the middle and backend have a hard time
 generating efficient code.</p>
 <p>In this talk, we want to focus on code offloading to GPUs (through OpenMP),
 an increasingly important part of modern programming. We will first highlight
 different reasons for missing optimizations and poor code quality before we
 introduce new <em>practical</em> solutions. While our implementation is still
 experimental, early results suggest that there is enormous optimization
 potential in both manually written, and automatically generated, target
 offloading code.</p>
 <p>In addition to the talk, we will, closer to the conference date, initiate a
 discussion on the LLVM mailing list and publish our implementation.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_6">
 <b>Handling massive concurrency: Development of a programming model for GPU and CPU</b>
  [ Video ]
  [ <a href="slides/TechTalk-Liedtke-Handling_massive_concurrency.pdf">Slides</a> ]<br>
 <i>Matthias Liedtke (SAP)</i>
 <p>For efficient parallel execution it is necessary to write massively
 concurrent algorithms and to optimize memory access.  In this session we show
 our approach of a programming model that is able to execute the same concurrent
 algorithm efficiently on GPUs and CPUs: Similar to OpenMP it allows the
 programmer to describe concurrency and memory access declaratively but hides
 complexity like memory transfers between the CPU and the GPU. In comparison to
 OpenMP our model provides a higher level of expressiveness which enables us to
 reach a performance comparable to OpenCL/CUDA.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_7">
 <b>Automated GPU Kernel Fusion with XLA</b>
  [ Video ]
  [ <a href="slides/TechTalk-Joerg-Automated_GPU_Kernel_Fusion_with_XLA.pdf">Slides</a> ]<br>
 <i>Thomas Joerg (Google)</i>
 <p>XLA (Accelerated Linear Algebra) is an optimizing compiler for linear
 algebra that accelerates TensorFlow computations. The XLA compiler lowers to
 LLVM IR and relies on LLVM for low-level optimization and code generation. XLA
 achieves significant performance gains on TensorFlow models. We observed
 speedups of up to 3x on internal models. The popular image classification model
 ResNet-50 trains 1.6x faster.</p>
 <p>A key optimization performed by XLA is automated GPU kernel fusion. The idea
 is to combine multiple linear algebra operators into a single GPU kernel to
 reduce memory bandwidth requirements and kernel launch overhead. TensorFlow
 with XLA demonstrated competitive performance on MLPerf benchmarks (mlperf.org)
 compared to ML frameworks that rely on manually fused, hand-tuned GPU
 kernels.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_8">
 <b>The Helium Haskell compiler and its new LLVM backend</b>
  [ Video ]
  [ <a href="slides/TechTalk-de-Wolff-Helium_Haskell_compiler.pdf">Slides</a> ]<br>
 <i>Ivo Gabe de Wolff (University of Utrecht)</i>
 <p>Helium, developed at the University of Utrecht, is a compiler for the
 functional, lazy language Haskell. It is used for research on error diagnosis
 and teaching. In this talk we will however focus on the new LLVM backend and
 the compilation of high level features like lambdas, laziness (call-by-need
 semantics), currying (partial application). Furthermore we discuss some high
 level optimizations which cannot be done at LLVM-level.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_9">
 <b>Testing and Qualification of Optimizing Compilers for Functional Safety</b>
  [ Video ]
  [ <a href="slides/TechTalk-Cabrelles-Testing_and_qualification_of_optimizing_compilers_for_functional_safety.pdf">Slides</a> ]<br>
 <i>Jos&eacute; Luis March Cabrelles (Solid Sands)</i>
 <p>In the development of embedded applications, the compiler plays a crucial
 role in the translation from source to machine code. If the application is
 safety-critical, functional safety standards such as ISO 26262 for the
 automotive industry require that the user of the compiler develops confidence
 in the compilers correct operation. In this presentation we will discuss the
 requirements of ISO 26262 on tools such as LLVM compilers and how they can be
 met with a testing procedure that works well with the V-Model of
 engineering.</p>
 <p>As the name implies, functional safety standards deal with specified
 functionality of components. But what about the optimizations that a LLVM-based
 compiler applies to the program, sometimes even silently? Optimizations are not
 even mentioned in the language standards for C and C++ - they are
 &quot;non-functional&quot; behavior of the compiler. As we will demonstrate,
 ignoring optimizations will lead to significant holes in the compiler&#39;s
 test coverage. We will show how we have developed a technique that achieves
 good results with optimization testing and have some errors in Intel&#39;s
 well-regarded Clang-based compiler to show. To show the completeness of our
 method for the requirements of functional safety, we have analyzed how the
 tests match with the various LLVM IR-level transformation passes that they go
 through.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_10">
 <b>Improving Debug Information in LLVM to Recover Optimized-out Function Parameters</b>
  [ Video ]
  [ <a href="slides/TechTalk-Prica-Improving_LLVM_DebugInfo.pdf">Slides</a> ]<br>
 <i>Nikola Prica (RT-RK), Djordje Todorovic (RT-RK), Ananthakrishna Sowda (CISCO), Ivan Baev (CISCO)</i>
 <p>Software release products are compiled with optimization level &#45;O2 and
 higher. Such products might produce a core-file that is used for investigating
 cause of problem that produced it. First thing from which we start debug
 analysis is call-trace from a crash. In such traces most of the parameters are
 reported as optimized out due to variety of reasons. Some of parameters are
 really optimized out, but some of their locations could be calculated. Expert
 software developers are able to find what values parameters had at function
 entry point by using the technique that requires searching those values in
 disassembly of caller frame at place of that particular function call.
 Automation of such technique is described by DWARF 5 specifications and it is
 already implemented in GCC and GDB since 2011. The goal of this paper is to
 present ideas, implementation and problems that we encountered while we were
 working on this feature in LLVM. We will also show the improvement by
 presenting recovered parameters in some of the call-traces. This feature should
 improve debugging of optimized code built with LLVM by recovering optimized-out
 function parameters.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_11">
 <b>LLVM IR in GraalVM: Multi-Level, Polyglot Debugging with Sulong</b>
  [ Video ]
  [ <a href="slides/TechTalk-Kreindl-LLVM_IR_in_GraalVM.pdf">Slides</a> ]<br>
 <i>Jacob Kreindl (Johannes Kepler University Linz)</i>
 <p>Sulong is an execution engine for LLVM bitcode that has support for
 debugging programs at the level of source code as well as textual LLVM IR. It
 is part of GraalVM, a polyglot virtual machine that can also execute programs
 written in multiple dynamic programming languages such as Ruby and Python.
 Sulong supports GraalVM&#39;s language-agnostic tooling interface to provide a
 rich debugging experience to developers. This includes source-level debugging
 of native extensions compiled to LLVM bitcode and the dynamic language programs
 that use them, together in the same debugger session and front-end. Sulong also
 enables developers to debug programs at the level of LLVM IR, including
 stepping through the textual IR and inspecting the symbols it contains.</p>
 <p>In this talk we will describe different ways GraalVM enables users to debug
 programs that were compiled to LLVM bitcode. We will introduce the general
 features of GraalVM-based debuggers by demonstrating source-level debugging of
 a standalone C/C++ application. Building on this we will showcase GraalVM&#39;s
 ability to provide a truly integrated debugging experience for native
 extensions of dynamic language programs to users. We will further demonstrate
 Sulong&#39;s support for debugging programs at the LLVM-IR level.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_12">
 <b>LLDB Reproducers</b>
  [ Video ]
  [ <a href="slides/TechTalk-Devlieghere-LLDB_reproducers.pdf">Slides</a> ]<br>
 <i>Jonas Devlieghere (Apple)</i>
 <p>The debugger, like the compiler, is a complex piece of software where bugs
 are inevitable. When a bug is reported, one of the first steps in its life
 cycle is trying reproduce the problem. Given the number of moving parts in the
 debugger, this can be quite challenging. Especially for more sophisticated
 problems, a small changes in the environment, the binary, its dependencies, or
 debug information might hide the problem. Getting this right puts a heavy
 burden on both the reporter and the developer.</p>
 <p>Reproducers are a way to automate this process. They contains the necessary
 information for a bug to occur again with minimal interaction from the
 developer. For clang a reproducer consists of a script with the compiler
 invocation and a pre-processed source file. Doing the same thing for the
 debugger is much more complicated.</p>
 <p>This talk discusses what was needed to have working reproducers for LLDB. It
 goes into detail about what information was needed, how it was captured and
 finally how the debugger uses it to reproduce an issue. The high level design
 is addressed as well as some of the challenges, such as dealing with low-level
 details, remote debugging, and the SB API. It concludes with an overview of
 what is possible and what isn&#39;t.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_13">
 <b>Sulong: An experience report of using the "other end" of LLVM in GraalVM.</b>
  [ Video ]
  [ <a href="slides/TechTalk-Schatz-Sulong_an_experience_report.pdf">Slides</a> ]<br>
 <i>Roland Schatz (Oracle Labs), Josef Eisl (Oracle Labs)</i>
 <p>The most common use-case for LLVM is to re-use its back-end to implement a
 compiler for new programming languages. In project Sulong, we are going a
 different route: We use LLVM frontends, and consume the resulting bitcode.
 Sulong is the LLVM bitcode execution engine of GraalVM, a ployglot virtual
 machine that executes JavaScript, Python, Ruby, R, and others. The goal of
 Sulong is to bring C, C++, Fortran, and other languages that compile to LLVM
 bitcode into the system, and allow low-cost interoperability across language
 borders. The latter is crucial for efficiently supporting existing native
 interfaces of dynamic languages.</p>
 <p>In this talk, we want to share our experience with implementing an engine
 for executing LLVM IR in GraalVM. We will discuss how Sulong executes LLVM
 bitcode and why this allows high-performance interoperability between
 languages. We will show the challenges of implementing existing native
 interfaces in new runtime environments, and how we use the different parts of
 the LLVM project for solving them. We want to focus on situations we found
 challenging and where we think we can contribute to the project.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_14">
 <b>SYCL compiler: zero-cost abstraction and type safety for heterogeneous computing</b>
  [ Video ]
  [ <a href="slides/TechTalk-Savonichev-SYCL_compiler.pdf">Slides</a> ]<br>
 <i>Andrew Savonichev (Intel)</i>
 <p>SYCL is an abstraction layer for C++, that allows a developer to write
 heterogeneous programs in a &quot;single source&quot; model, where host and
 device code are written in the same file. Utilizing modern C++ features, SYCL
 provides a way to develop type-safe and efficient programs for various
 accelerator devices.</p>
 <p>Although SYCL is designed as &quot;extension-free&quot; standard C++ API,
 there is a need to have some compiler extensions to enable C++ code execution
 on accelerators. SYCL compiler is responsible for &quot;extracting&quot; device
 part of code and compiling it to SPIR-V format or device native binary. In
 addition to that, compiler should also emit auxiliary information, which is
 used by SYCL runtime to run a device code via OpenCL API.</p>
 <p>This talk will go over technical details of the SYCL compiler, and the
 changes we need to make in order to bring full support for SYCL into upstream
 LLVM and Clang as described in the RFC:
 <a href="https://lists.llvm.org/pipermail/cfe-dev/2019-January/060811.html">
   https://lists.llvm.org/pipermail/cfe-dev/2019-January/060811.html</a></p>
 </td></tr>

 <tr><td valign="top" id="Talk_15">
 <b>Handling all Facebook requests with JITed C++ code</b>
  [ Video ]
  [ <a href="slides/TechTalk-Guo-Zhou-Handling_all_Facebook_requests_with_JITed_C++_code.pdf">Slides</a> ]<br>
 <i>Huapeng Zhou (Facebook), Yuhan Guo (Facebook)</i>
 <p>Facebook needs an efficient scripting framework to enable fast iteration of
 HTTP request handling logic in our L7 reverse proxy. A C++ scripting engine and
 code deployment ecosystem was created to compile/link/execute C++ script at
 run-time, using Clang and LLVM ORC APIs. The framework allows developers to
 write business logic and unit test in C++ script, as well as debug using GDB.
 Profiling using perf is also supported for PGO purpose. This new framework
 outperformed another previously used scripting language by up to 4X, measured
 in execution time.</p>
 <p>In order to power the C++ script in ABI compatible way, a PCH (pre-compiled
 header) is built statically to provide declarations and definitions of
 necessary dependent types and methods. Clang APIs are then used at run-time to
 transform source code to LLVM IR, which are later passed through LLVM ORC
 layers for linking/optimizing. Above Clang/LLVM toolchains are statically
 linked into main binary to ensure compatibility between PCH and C++ scripts. As
 a result, scripts could be deployed in real time without any main binary
 change.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_16">
 <b>clang-scan-deps: Fast dependency scanning for explicit modules</b>
  [ Video ]
  [ <a href="slides/TechTalk-Lorenz-clang-scan-deps_Fast_dependency_scanning_for_explicit_modules.pdf">Slides</a> ]<br>
 <i>Alex Lorenz (Apple), Michael Spencer (Apple)</i>
 <p>The dependency information that&#39;s provided by Clang can be used to
 implement a pre-scanning phase for a build system that uses Clang modules in an
 explicit manner, by discovering the required modules before compiling. However,
 the traditional approach of preprocessing all sources to find the required
 modular dependencies is typically not fast enough for a pre-scanning phase that
 must run for every build. This talk introduces clang-scan-deps, an optimized
 dependency discovery service that can provide speed up of up to 10X over the
 regular preprocessor-based scanning. This talk goes into details of how this
 service is implemented and how it can be leveraged by the build system to
 implement a fast pre-scanning phase for explicit Clang modules.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_17">
 <b>Clang tools for implementing cryptographic protocols like OTRv4</b>
  [ Video ]
  [ <a href="slides/TechTalk-Celi-Clang_tools_for_implementing_cryptographic_protocols_like_OTRv4.pdf">Slides</a> ]<br>
 <i>Sofia Celi (Centro de Autonomia Digital)</i>
 <p>OTRv4 is the newest version of the Off-The-Record protocol. It is a protocol
 where the newest academic research intertwines with real-world implementations:
 it provides end to end encryption, and offline and online deniability for
 interactive and non-interactive applications. As a real world protocol, it
 needs to provide an implementation that works for real world users. For this,
 the OTRv4 team decided to implement it in C. But as we know, working in C can
 be challenging due to several factors.</p>
 <p>In order to make OTRv4s implementation much safer and usable, we decided to
 use several clang tools, such as clang format, clang tidy and address
 sanitizers. By using these tools, we uncovered bugs, issues and problems. In
 this talk, we aim to highlight the most interesting bugs we uncovered by using
 these tools, by comparing the results of using static analysis and fast memory
 error detector. We also aim to highlight the importance of using a specific
 code formatting style, as it makes an implementation much clearer and easier to
 find bugs. We plan to high point the importance of using these tools on real
 world implementations that are going to be used by millions of users and that
 aim to provide the best security properties available.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_18">
 <b>Implementing the C++ Core Guidelines&#39; Lifetime Safety Profile in Clang</b>
  [ Video ]
  [ <a href="slides/TechTalk-Horvath-Implementing_the_C++_Core_Guidelines_Lifetime.pdf">Slides</a> ]<br>
 <i>Gabor Horvath (Eotvos Lorand University), Matthias Gehre (Silexica GmbH),
   Herb Sutter (Microsoft)</i>
 <p>This is an experience report of the Clang-based implementation of Herb
 Sutter&#39;s Lifetime safety profile for the C++ Core Guidelines, available
 online at cppx.godbolt.org.</p>
 <p>We will cover the kinds of diagnoses supported by the checker and how they
 are implemented using Clang&#39;s control flow graph. We will discuss what are
 the main problems of the current prototype and what are we going to do to fix
 those. We also plan to discuss the upstreaming process. Some parts of the
 analysis might end up improving existing clang warnings some of which are on by
 default. We will also summarize early experience with performance against
 real-world code bases, including compile time performance for LLVM sources with
 the checker.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_19">
 <b>Changes to the C++ standard library for C++20</b>
  [ Video ]
  [ <a href="slides/TechTalk-Clow-Changes_to_the_C++_standard_library_for_C++20.pdf">Slides</a> ]<br>
 <i>Marshall Clow (CppAlliance)</i>
 <p>The next version of the C++ standard will almost certainly be approved next
 year, and be called C++20.   There will be many new features in the standard
 library in C++20. Things like ranges, concepts, calendar support, and many
 others. In this talk, I&#39;ll give an overview of the new features, and an
 update on the status of their implementation in libc++.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_20">
 <b>Adventures with RISC-V Vectors and LLVM</b>
  [ Video ]
  [ <a href="slides/TechTalk-Kruppe-Espasa-RISC-V_Vectors_and_LLVM.pdf">Slides</a> ]<br>
 <i>Robin Kruppe (TU Darmstadt), Roger Espasa (Esperanto Technologies)</i>
 <p>RISC-V is a free and open instruction set architecture (ISA) with an
 established LLVM backend and numerous open-source and proprietary hardware
 implementations. The work-in-progress vector extension adds standardized vector
 processing, taking lessons both from traditional long-vector machines and from
 packed-SIMD approaches that dominated industrial designs in the past few
 decades. The resulting architecture aims to excel at various scales, from small
 embedded cores to large HPC accelerators and everything in between.</p>
 <p>In this talk you will learn about the RISC-V vector ISA as well as LLVM
 support for it: vectorizing loops without needing scalar remainder handling,
 vectors whose length is not known at compile time, a vector unit that can be
 dynamically reconfigured for increased efficiency, and more.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_21">
 <b>A Tale of Two ABIs: ILP32 on AArch64</b>
  [ Video ]
  [ <a href="slides/TechTalk-Northover-A_tale_of_two_ABIs_ILP32_on_AArch64.pdf">Slides</a> ]<br>
 <i>Tim Northover (Apple)</i>
 <p>We faced the challenge of seamlessly running 32b application binaries on a
 new 64b S4 chip, which has no hardware support to run 32b binaries. Translating
 the ARM binaries directly to the new hardware would be hard, but when an
 application is available in bitcode format, the task is much more feasible.
 This talk opens the curtain for an inside look into the decisions and steps
 taken to translate 32b bitcode for the new 64b hardware. It will discuss the
 many design, implementation and verification challenges of introducing a new
 ABI, arm64_32, which guarantees that the binaries for the new S4 chip are
 compatible to the original 32b applications.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_22">
 <b>LLVM Numerics Improvements</b>
  [ Video ]
  [ <a href="slides/TechTalk-Berg-LLVM_numerics_improvements.pdf">Slides</a> ]<br>
 <i>Michael Berg (Apple), Steve Canon (Apple)</i>
 <p>Some LLVM based compilers currently provide two modes of floating point code
 generation. The first mode, called fast-math, is where performance is the
 primary consideration over numerical precision and accuracy. This mode does not
 strictly follow the IEEE-754 standard, but has proven useful for applications
 that do not require this level of precision. The second mode, called
 precise-math, is where the compiler carefully follows the subset of behavior
 defined in the IEEE standard that is applicable to conforming hardware targets.
 This mode is primarily used for compute workloads and wherever fast-math
 precision is inadequate, however it runs much slower as it requires a larger
 number of instructions in general. In practice neither of these modes is
 particularly desirable. The fast-math mode ignores a significant portion of the
 standard as pertains to handling undefined values described as Not a Number
 (NaNs) and Infinities (INFs), resulting in difficulties for certain workloads
 when the hardware target computes these values correctly and performance
 remains critical.</p>
 <p>Until recently these two models were mutually exclusive, however with the
 addition of IR flags they need not be.  For instance, the FastMath metadata
 module flag drives behavior deemed numerically unsafe when it is enabled, by
 indiscriminately enabling optimizations. With IR flags this behavior can be
 enabled with much finer granularity, allowing various code forms to be fast or
 precise together in one module.  We call this mixed mode compilation.  IR flags
 can be used individually or paired to produce desired floating point behavior
 under specified constraints with fine granularity of control.  Optimization
 passes have been modified under this new kind of control to produce this
 behavior.  This talk will describe the recent numerics work and discuss the
 implications for front-ends and backends built with LLVM.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_23">
 <b>DOE Proxy Apps: Compiler Performance Analysis and Optimistic Annotation Exploration</b>
  [ Video ]
  [ <a href="slides/TechTalk-Homerding-DOE_proxy_apps_compiler_performance_analysis_and_optimistic_annotation_exploration.pdf">Slides</a> ]<br>
 <i>Brian Homerding (Argonne National Laboratory), Johannes Doerfert (Argonne National Laboratory)</i>
 <p>The US Department of Energy proxy applications are simplified models of the
 key components of various scientific computing workloads.  These proxy
 applications are useful for research and exploration in many areas, including
 software technology.  We have conducted performance analysis of these proxy
 application using Clang, GCC and some vendor compilers.  These results have
 identified and motivated our work on modelling the memory access of math
 functions in Clang.  We will discuss our design and our work to expose this
 ability to encode function information to the developer.  Additionally in this
 area, I will then discuss my collaboration on a development tool designed to
 explore both the potential performance gap lost from knowledge the developer
 could encode (but did not) and the extent to which LLVM is able to profitably
 make use of this information.</p>
 </td></tr>

 <tr><td valign="top" id="Talk_24">
 <b>Loop Fusion, Loop Distribution and their Place in the Loop Optimization Pipeline</b>
  [ Video ]
  [ <a href="slides/TechTalk-Barton-Loop_fusion_loop_distribution_and_their_place_in_the_loop_optimization_pipeline.pdf">Slides</a> ]<br>
 <i>Kit Barton (IBM), Johannes Doerfert (Argonne National Lab), Hal Finkel
   (Argonne National Lab), Michael Kruse (Argonne National Lab)</i>
 <p>Loop fusion and loop distribution are two key optimizations that typically
 are featured prominently in a loop optimization pipeline. They are used both to
 improve performance of applications and also to enable other loop
 optimizations. For example, loop fusion can improve the performance of
 applications through increasing temporal data cache locality. It can also
 increase the scope of other optimizations by creating larger loop nests for
 intra-loop nest optimizations to work on. Similarly, loop distribution is often
 used to improve performance directly by distributing loops that exceed hardware
 resources (e.g., register pressure). It is also frequently used to distribute
 loops containing loop-carried dependencies into two loops: one with loop
 carried dependencies and the second with no loop carried dependencies; this
 enables other optimizations (e.g., vectorization) on the independent loop.
 Furthermore, these two optimizations can work nicely together, as they have the
 ability to &quot;undo&quot; transformations done by the other. Thus, the
 implementation of both of these optimizations must be robust as they can both
 play an important role in a loop optimization pipeline.</p>
 <p>This talk will be a follow-on to &quot;Revisiting Loop Fusion, and its place
 in the loop transformation framework&quot;, presented at the 2018 LLVM
 Developers&#39; Meeting. The patch to implement basic loop fusion described in
 the talk is currently undergoing review on phabricator (<a
 href="https://reviews.llvm.org/D55851">https://reviews.llvm.org/D55851</a>). We
 have prototypes to make loop fusion more aggressive by moving code from between
 two loops (making them adjacent) that will be posted for review once the basic
 loop fusion patch is accepted. We also have plans to peel loops to (to make
 their bounds conform), and improve the dependence analysis between the two loop
 bodies. This talk will also include findings from our current analysis of the
 loop distribution pass in LLVM. It will provide a summary of the strengths and
 limitations of loop distribution, and summarize any improvements that are made
 prior to EuroLLVM 2019. Finally, the presentation will discuss how loop fusion
 and loop distribution can fit into the existing loop optimization pipeline in
 LLVM.</p>
 </td></tr>
 </table>

 <div class="www_sectiontitle" id="Tutorial">Tutorials</div>
 <table cellpadding="10">

 <tr><td valign="top" id="Tutorial_1">
 <b>Tutorial: Building a Compiler with MLIR</b>
  [ <a href="https://www.youtube.com/watch?v=cyICUIZ56wQ">Video</a> ]
  [ <a href="slides/Tutorial-AminiVasilacheZinenko-MLIR.pdf">Slides</a> ]<br>
 <i>Amini Mehdi (Google), Nicolas Vasilache (Google), Alex Zinenko (Google)</i>
 <p>This tutorial will complement the technical talk about MLIR. We will
 implement a custom DSL for numerical processing and walk the audience
 step-by-step through the use of MLIR to support the lowering and the
 optimization of such DSL, and target LLVM for lower level optimizations and
 code generation or JIT execution.</p>
 </td></tr>

 <tr><td valign="top" id="Tutorial_2">
 <b>Building an LLVM-based tool: lessons learned</b>
  [ Video ]
  [ <a href="slides/Tutorial-Denisov-Building_an_LLVM_based_tool.pdf">Slides</a> ]<br>
 <i>Alex Denisov</i>
 <p>In this talk, I want to share my experience in building an LLVM-based tool.</p>
 <p>For the last three years, I work on a tool for mutation testing. Currently,
 it works on Linux, macOS, and FreeBSD and the source code is compatible with
   any LLVM version between 3.9 and 7.0. Anything that can run in parallel -
   runs in parallel. I will cover the following topics:<ul>
 <li>Build system: on supporting multiple LLVM versions and building against
   sources or precompiled binary.</li>
 <li>Parallelization: which parts of the tool can be parallelized and which
   should run in one thread</li>
 <li>Testing: how to build robust test suite for the tool</li>
 <li>Bitcode: on several ways to convert a program into LLVM bitcode, that can
   be used by the tool.</li>
 </ul></p>
 </td></tr>

 <tr><td valign="top" id="Tutorial_3">
 <b>LLVM IR Tutorial - Phis, GEPs and other things, oh my!</b>
  [ Video ]
  [ <a href="slides/Tutorial-Bridgers-LLVM_IR_tutorial.pdf">Slides</a> ]<br>
 <i>Vince Bridgers (Intel Corporation), Felipe de Azevedo Piovezan (Intel Corporation)</i>
 <p>LLVM intermediate representation (IR) is the abstract description machine
 operations used to translate LLVM front ends to a form that&#39;s executable by
 a target machine. Optimizations and transformations are performed on the IR by
 the LLVM library to create executable images. This tutorial will introduce the
 IR syntax, describe basic tools for manipulating IR formats, and describe
 mappings of IR from various common source code control structures. Tutorial
 materials with specific examples will be made available for the tutorial
 presentation, and for offline review.</p>
 </td></tr>
 </table>

 <div class="www_sectiontitle" id="SRC">Student Research Competition</div>
 <table cellpadding="10">

 <tr><td valign="top" id="SRC_1">
 <b>Safely Optimizing Casts between Pointers and Integers</b>
  [ Video ]
  [ <a href="slides/SRC-Lee-Safely_optimizing_casts_between_pointers_and_integers.pdf">Slides</a> ]<br>
 <i>Juneyoung Lee (Seoul National University, Korea), Chung-Kil Hur (Seoul
   National University, Korea), Ralf Jung (MPI-SWS, Germany), Zhengyang Liu
   (University of Utah, USA), John Regehr (University of Utah, USA), Nuno P.
 Lopes (Microsoft Research, UK)</i>
 <p>In this talk, a list of optimizations that soundly removes casts between
 pointers and integers will be presented. In LLVM, a pointer is more than just
 an integer: LLVM allows a pointer to track its underlying object, and the rule
 to find it is defined as based-on relation. This allows LLVM to aggressively
 optimize load/stores, but makes the meaning of pointer-integer casts
 complicated. This causes conflict between existing optimizations, causing
 long-standing miscompilation bugs like 34548.</p>
 <p>To fix it, we suggest disabling folding of inttoptr(ptrtoint(p)) to p and
 using a safe workaround to remove them. This optimization is important because
 it&#39;s removing a significant portion of such cast pairs. We&#39;ll show that
 even if the optimization is disabled, majority of casts can be removed by
 carefully adding new \&amp; modifying existing optimizations. After the
 updates, the performance is still comparable to the original LLVM.</p>
 </td></tr>

 <tr><td valign="top" id="SRC_2">
 <b>An alternative OpenMP Backend for Polly</b>
  [ Video ]
  [ <a href="slides/SRC-Halkenhauser-An_alternative_OpenMP_backend_for_Polly.pdf">Slides</a> ]<br>
 <i>Michael Halkenh&auml;user (TU Darmstadt)</i>
 <p>LLVM&#39;s polyhedral infrastructure framework Polly may automatically
 exploit thread-level parallelism through OpenMP. Currently, the user can only
 influence the number of utilized threads, while other OpenMP parameters such as
 the scheduling type and chunk size are set to fixed values. This in turn,
 limits a user&#39;s ability to adapt the optimization process for a given
 problem.</p>
 <p>In this work, we present an alternative OpenMP backend for Polly, which
 provides additional customization options to the user and is based on the LLVM
 OpenMP runtime. We evaluate our new backend and the influence of the new
 customization options on performance and compare to Polly&#39;s existing OpenMP
 backend.</p>
 </td></tr>

 <tr><td valign="top" id="SRC_3">
 <b>Implementing SPMD control flow in LLVM using reconverging CFGs</b>
  [ Video ]
  [ <a href="slides/SRC-Wahlster-Implementing_SPMD_control_flow_in_LLVM_using_reconverging_CFGs.pdf">Slides</a> ]<br>
 <i>Fabian Wahlster (Technische Universit&auml;t M&uuml;nchen), Nicolai
   H&auml;hnle (Advanced Micro Devices)</i>
 <p>Compiling programs for an SPMD execution model, e.g. for GPUs or for whole
 program vectorization on CPUs, requires a transform from the thread-level input
 program into a vectorized wave-level program in which the values of the
 original threads are stored in corresponding lanes of vectors.  The main
 challenge of this transform is handling divergent control flow, where threads
 take different paths through the original CFG.  A common approach, which is
 currently taken by the AMDGPU backend in LLVM, is to first structurize the
 program as a simplification for subsequent steps.</p>
 <p>However, structurization is overly conservative.  It can be avoided when
 control flow is uniform, i.e. not divergent.  Even where control flow is
 divergent, structurization is often unnecessary.  Moreover, LLVM&#39;s
 StructurizeCFG pass relies on region analysis, which limits the extent to which
 it can be evolved.</p>
 <p>We propose a new approach to SPMD vectorization based on saying that a CFG
 is reconverging if for every divergent branch, one of the successors is a
 post-dominator.  This property is weaker than structuredness, and we show that
 it can be achieved while preserving uniform branches and inserting fewer new
 basic blocks than structurization requires.  It is also sufficient for code
 generation, because it guarantees that threads which &quot;leave&quot; a wave
 at divergent branches will be able to rejoin it later.</p>
 </td></tr>

 <tr><td valign="top" id="SRC_4">
 <b>Function Merging by Sequence Alignment</b>
   [ Video ]
   [ <a href="slides/SRC-Rocha-Function_merging_by_sequence_alignment.pdf">Slides</a> ]<br>
 <i>Rodrigo Rocha (University of Edinburgh), Pavlos Petoumenos (University of
   Edinburgh), Zheng Wang (Lancaster University), Murray Cole (University of
   Edinburgh), Hugh Leather (University of Edinburgh)</i>
 <p>Resource-constrained devices for embedded systems are becoming increasingly
 important. In such systems, memory is highly restrictive, making code size in
 most cases even more important than performance. Compared to more traditional
 platforms, memory is a larger part of the cost and code occupies much of it.
 Despite that, compilers make little effort to reduce code size. One key
 technique attempts to merge the bodies of similar functions. However,
 production compilers only apply this optimization to identical functions, while
 research compilers improve on that by merging the few functions with identical
 control-flow graphs and signatures. Overall, existing solutions are
 insufficient and we end up having to either increase cost by adding more memory
 or remove functionality from programs.</p>
 <p>We introduce a novel technique that can merge arbitrary functions through
 sequence alignment, a bioinformatics algo- rithm for identifying regions of
 similarity between sequences. We combine this technique with an intelligent
 exploration mechanism to direct the search towards the most promising function
 pairs.  Our approach is more than 2.4x better than the state-of-the-art,
 reducing code size by up to 25%, with an overall average of 6%, while
 introducing an average compilation-time overhead of only 15%. When aided by
 profiling information, this optimization can be deployed without any
 significant impact on the performance of the generated code.</p>
 </td></tr>

 <tr><td valign="top" id="SRC_5">
 <b>Compilation and optimization with security annotations</b>
  [ Video ]
  [ <a href="slides/SRC-TuanVu-Compilation_and_optimization_with_security_annotations.pdf">Slides</a> ]<br>
 <i>Son Tuan Vu (LIP6), Karine Heydemann (LIP6), Arnaud de Grandmaison (ARM),
   Albert Cohen (Google)</i>
 <p>Program analysis and program transformation systems need to express
 additional program properties, to specify test and verification goals, and to
 enhance their effectiveness. Such annotations are typically inserted to the
 representation on which the tool operates; e.g., source level for establishing
 compliance with a specification, and binary level for the validation of secure
 code. While several annotation languages have been proposed, these typically
 target the expression of functional properties. For the purpose of implementing
 secure code, there has been little effort to support non-functional properties
 about side-channels or faults. Furthermore, analyses and transformations making
 use of such annotations may target different representations encountered along
 the compilation flow.</p>
 <p>We extend an annotation language to express a wider range of functional and
 non-functional properties, enabling security-oriented analyses and influencing
 the application of code transformations along the compilation flow. We
 translate this language to the different compiler representations from abstract
 syntax down to binary code. We explore these concepts through the design and
 implementation of an optimizing, annotation-aware compiler, capturing
 annotations from the program source, propagating and emitting them in the
 binary, so that binary-level analysis tools can use them.</p>
 </td></tr>

 <tr><td valign="top" id="SRC_6">
 <b>Adding support for C++ contracts to Clang</b>
  [ Video ]
  [ <a href="slides/SRC-Lopez-Gomez-Adding_support_for_C++_contracts_to_clang.pdf">Slides</a> ]<br>
 <i>Javier L&oacute;pez-G&oacute;mez (University Carlos III of Madrid), J.
   Daniel Garc&iacute;a (University Carlos III of Madrid)</i>
 <p>A language supporting contract-checking allows to detect programming errors.
 Also, making this information available to the compiler may cause it to perform
 additional optimizations.</p>
 <p>This paper presents our implementation of the P0542R5 technical
 specification (now part of the C++20 working draft).</p>
 </td></tr>
 </table>

 <div class="www_sectiontitle" id="LightningTalk">Lightning talks</div>
 <table cellpadding="10">

 <tr><td valign="top" id="LightningTalk_1">
 <b>LLVM IR Timing Predictions: Fast Explorations via lli</b>
  [ Video ]
  [ Slides ]<br>
 <i>Alessandro Cornaglia (FZI - Research Center for Information Technology)</i>
 <p>Many applications, especially in the embedded domain, have to be executed on
 different hardware target platforms. For these applications, it is necessary to
 evaluate both functional and non-functional properties, such as software
 execution time, in all their hardware/software combinations. Especially in the
 context of software product line engineering, it is not feasible to test all
 variants one-by-one. The intermediate representation of the source code offers
 an attractive opportunity for a single-run analysis, because it covers the
 software variability, while at the same time omitting the hardware-dependent
 optimizations.</p>
 <p>We present an extension for the LLVM IR execution engines, which are part of
 the LLVM lli tool. The extension evaluates on the fly functional and
 non-functional properties for all the hardware variants during one lli
 execution. In particular, our extension is designed for the evaluation of the
 execution time of a program for multiple target platforms considering different
 software variants. Both the interpreter and JIT execution modes are supported.
 Prospectively, our approach will be enriched with multiple analysis techniques.
 Thanks to our approach, it is now possible to evaluate software variants with
 regard to multiple hardware platforms in a single lli execution run.</p>
 </td></tr>

 <tr><td valign="top" id="LightningTalk_2">
 <b>Simple Outer-Loop-Vectorization == LoopUnroll-And-Jam + SLP</b>
  [ Video ]
  [ <a href="slides/Lightning-Das-AMD_OLV.pdf">Slides</a> ]<br>
 <i>Dibyendu Das (AMD)</i>
 <p>In this brief talk I will show how Outer-Loop-Vectorization (OLV), which is
 of great interest to the LLVM community, can be visualized as a combination of
 two transformations applied to a loop-nest of interest. These two
 transformations are LoopUnrollAndJam and SLP. LoopUnrollAndJam is a fairly new
 addition to the LLVM loop-optimization repertoire. Combined with a fairly
 powerful SLP that LLVM supports today, we are able to vectorize the outer loop
 of several important kernels automatically without the support of any pragma.
 At present our implementation is at the level of a PoC and does not exploit any
 rigorous costing mechanism. While we understand that OLV is being implemented
 in the LoopVectorizer using the VPlan technique, this paper highlights a quick
 and cheap way to solve the same problem in a different manner using two
 existing transforms.</p>
 </td></tr>

 <tr><td valign="top" id="LightningTalk_3">
 <b>Clacc 2019: An Update on OpenACC Support for Clang and LLVM</b>
  [ Video ]
  [ <a href="slides/Lightning-Denny-clacc.pdf">Slides</a> ]<br>
 <i>Joel E. Denny (Oak Ridge National Laboratory), Seyong Lee (Oak Ridge
   National Laboratory), Jeffrey S. Vetter (Oak Ridge National Laboratory)</i>
 <p>We are developing production-quality, standard-conforming OpenACC [1]
 compiler and runtime support in Clang and LLVM for the US Exascale Computing
 Project [2][3].  A key strategy of Clacc&#39;s design is to translate OpenACC
 to OpenMP in order to leverage Clang&#39;s existing OpenMP compiler and runtime
 support and to minimize implementation divergence.  To maximize reuse of the
 OpenMP implementation and to facilitate research and development into new
 source-level tools involving both the OpenACC and OpenMP levels, Clacc
 implements this translation in the Clang AST using Clang&#39;s TreeTransform
 facility.  However, we are also following LLVM IR parallel extensions being
 developed by the community as a path to improve compiler optimizations and
 analyses.</p>
 <p>The purpose of this talk is to provide an update on Clacc progress over the
 preceding year including early performance results, to present the plan for the
 year ahead, and to invite participation from others.  Clacc&#39;s OpenACC
 support is still maturing and we have not yet offered it upstream.  However, we
 have already upstreamed many mutually beneficial improvements from the Clacc
 project, including improvements to LLVM&#39;s testing infrastructure and to
 Clang and its OpenMP support.  This talk will summarize those contributions as
 well.</p>
 <p>[1] OpenACC standard: <a href="https://www.openacc.org/">https://www.openacc.org/</a> </p>
 <p>[2] Clacc: Translating OpenACC to OpenMP in Clang.  Joel E. Denny, Seyong
 Lee, and Jeffrey S. Vetter.  2018 IEEE/ACM 5th Workshop on the LLVM Compiler
 Infrastructure in HPC (LLVM-HPC), Dallas, TX, USA, (2018).</p>
 <p>[3] Clacc: OpenACC Support for Clang and LLVM.  Joel E. Denny, Seyong Lee,
 and Jeffrey S. Vetter.  2018 European LLVM Developers Meeting (EuroLLVM
 2018).</p>
 </td></tr>

 <tr><td valign="top" id="LightningTalk_4">
 <b>Targeting a statically compiled program repository with LLVM</b>
  [ Video ]
  [ <a href="slides/Lightning-Camp-Program_Repo.pdf">Slides</a> ]<br>
 <i>Phil Camp (SN Systems), Russell Gallop (SN Systems)</i>
 <p>Following on from the 2016 talk &quot;Demo of a repository for statically
 compiled programs&quot;, this lightning talk will present a brief overview of
 how LLVM was modified to target a program repository. This includes adding a
 new target output format and a new optimization pass to skip program elements
 already present in the repository. Reference: <a
 href="https://github.com/SNSystems/llvm-prepo">https://github.com/SNSystems/llvm-prepo</a></p>
 </td></tr>

 <tr><td valign="top" id="LightningTalk_5">
 <b>Does the win32 clang compiler executable really need to be over 21MB in size?</b>
  [ Video ]
  [ <a href="slides/Lightning-Gallop-Does_win32_clang_compiler.pdf">Slides</a> ]<br>
 <i>Russell Gallop (SN Systems), Greg Bedwell (SN Systems)</i>
 <p>The title of this lighting talk is from a bug filed in the early days of the
 PS4 compiler. It noted that the LLVM-based PS4 compiler was more than 3 times
 larger than the PS3 compiler. Since then it has almost doubled to over 40MB.
 For a compiler which targets one system this seems excessive. Executable size
 can cost in worse cache performance and cost time if transferring for
 distributed builds.</p>
 <p>In this lightning talk I will look at where this comes from and how it can
 be managed.</p>
 </td></tr>

 <tr><td valign="top" id="LightningTalk_6">
 <b>Resolving the almost decade old checker dependency issue in the Clang Static Analyzer</b>
  [ Video ]
  [ <a href="slides/Lightning-Umann-Resolving_the_almost.pdf ">Slides</a> ]<br>
 <i>Krist&oacute;f Umann (Ericsson Hungary, E&ouml;tv&ouml;s Lor&aacute;nd University)</i>
 <p>As checkers grew in numbers in the Static Analyzer, the problem of certain
 checkers depending on one another was inevitable. One particular problem, for
 example, is that a checker called MallocChecker, which despite its name does
 all sorts of memory allocation and de- or reallocation related checks, depends
 on CStringChecker to model calls to strcmp. While these checkers are completely
 separate entities, the Static Analyzer also contains large checker classes that
 in fact expose multiple checkers to the user: For example, IteratorChecker has
 a modeling part, and it exposes 3 iterator related checkers, and enabling any
 of the three will also enable the unexposed modeling part. Having both of these
 structures makes it difficult to find a solution where the developer (or the
 experienced user) can easily see what checkers are enabled, as these
 dependencies are only expressed in the implementation.</p>
 <p>This talk is going to discuss elegant solutions as to how these rather
 fragile checker structures can be preserved by declaring these dependencies in
 TableGen files, how checker developers (and users) can ensure that when the
 analyzer is invoked, only the requested checkers will be enabled, and also take
 a very brief look at what other features the analyzer gained thanks to these
 issues being resolved.</p>
 </td></tr>

 <tr><td valign="top" id="LightningTalk_7">
 <b>Adopting LLVM Binary Utilities in Toolchains</b>
  [ Video ]
  [ <a href="slides/Lightning-Rupprecht-Adopting_LLVM_binary_utilities.pdf">Slides</a> ]<br>
 <i>Jordan Rupprecht (Google)</i>
 <p>Although many projects have migrated from GCC-based toolchains to
 Clang-based ones, tools from the GNU Binutils collection are still widely used
 despite having equivalents in the LLVM project. The problems faced when
 attempting to use LLVM tools range anywhere from simple command line syntax
 differences to unimplemented or buggy features. In this talk, I will describe
 some of the types of challenges we faced when adopting LLVM tools, as well as
 some of the strategies we used to test the toolchain.</p>
 </td></tr>

 <tr><td valign="top" id="LightningTalk_8">
 <b>Multiplication and Division in the Range-Based Constraint Manager</b>
  [ Video ]
  [ <a href="slides/Lightning-Balogh-Multiplication_and_Division_in_the_Range-Based_Constraint_Manager.pdf">Slides</a> ]<br>
 <i>&Aacute;d&aacute;m Balogh (Ericsson Hungary Ltd.)</i>
 <p>The default constraint manager of the Clang Static Analyzer is a simple
 range-based constraint manager: it stores and manages the valid ranges for the
 values of symbolic expressions. Upon new assumptions it further constrains
 these ranges which often results in an empty range which tells the analyzer
 that the assumption is impossible. Until now the constraint manager could
 handle basic assumptions: A &lt;rel&gt; m, A + n &lt;rel&gt; m and A - n
 &lt;rel&gt; m where A is a symbolic expression, n and m integer constants and
 &lt;rel&gt; a relational operator. In the latter two cases where a constant is
 added or subtracted from the symbolic expression the range of the additive
 expression is calculated by adjusting the range circularly by the constant.
 However, it could not cope with division and multiplication, thus not even the
 range for A*2 could be deduced from the range of A. This shortcoming lead to
 both false positives and missed true positives.</p>
 <p>To improve the true positive/false positive ratio of the analyzer we
 extended the range-based constraint manager to be able to handle expressions of
 the format A &lt;mul&gt; k &lt;add&gt; n &lt;rel&gt; m, where A is a symbolic
 expression, k, m and n integer constants, &lt;mul&gt; a multiplicative operator
 (* or /), &lt;add&gt; an additive operator (+ or -) and &lt;rel&gt; a
 relational operator. The main challenge in our work was to correctly scale the
 ranges in the circular arithmetic: for example in case of signed 8 bit types in
 A * 2 == 56 the value of A could not only be 28, but also -100. Similarly, in A
 / 3 == 4 the value of A is not necessarily 12, but anything in range [12..14].
 To ensure full correctness we also proved our solution: first we generated
 every range for every constants in both the 8 bit signed and unsigned
 arithmetic, then we tested whether the scaling algorithm calculates exactly the
 same ranges. Finally we extrapolated this algorithm to wider integer types and
 ported it to the range-based constraint manager. According to our measurements
 there is no significant change in the performance and in the talk we will
 present numbers of lost false positives and new true positives.</p>
 </td></tr>

 <tr><td valign="top" id="LightningTalk_9">
 <b>Statistics Based Checkers in the Clang Static Analyzer</b>
  [ Video ]
  [ <a href="slides/Lightning-Balogh-Statistics_based_checkers_in_the_Clang_Static_Analyzer.pdf">Slides</a> ]<br>
 <i>&Aacute;d&aacute;m Balogh (Ericsson Hungary Ltd.)</i>
 <p>In almost every development project there are some conventions that the
 return value of some functions in an external library must be compared to some
 extremal value, such as zero. For example, many integer functions return
 negative number in case of error similarly to pointer functions returning null
 pointers. In a large project with many external functions it is virtually
 impossible to formalize all these rules explicitly: they are either unwritten
 or only exist in a natural language. To help enforcing these rules, we created
 checkers in the Clang Static Analyzer to explore these rules on statistical
 base and check the code for them. We currently support two kinds of extremal
 values: negative numbers for functions returning integers and null pointers for
 functions returning pointers.</p>
 <p>Example:</p>
 <p>int i = may_return_return_negative();</p>
 <p>v[i]; // error: negative indexing</p>
 <p>Exploration and checking for these rules happens in two phases: in the first
 phase we check every function call and create a summary for each function
 recording the percentage the return value is checked for negativeness (integer
 functions) or nullness (pointer functions). If this percentage is above a
 defined threshold (85% by default) we assume that the rule for the function
 exists. The second phase is the usual execution of the analyzer where a checker
 checks the code for violations of the rule: it splits the execution path to two
 branches at the call of the listed functions, where the return value in one
 branch is an extremal value (negative for integers or null for pointers) and
 non-extremal value on the other branch. Other checkers (e.g. the null-pointer
 dereference checker) are expected to find errors on the extremal-value branch
 if they are not terminated in the code by checking for the extremal-value. The
 performance impact of the state-split is low: in at least 85% of the cases the
 extremal-value branch is terminated quickly, in the remaining cases we expect
 another checker to create a sink-node because of an error. The new checker is
 under evaluation on open-source projects. We found some false positives,
 however their amount can be reduced by involving the arguments into the
 statistics.</p>
 </td></tr>

 <tr><td valign="top" id="LightningTalk_10">
 <b>Flang Update</b>
  [ Video ]
  [ <a href="slides/Lightning-Scalpone-Flang_update.pdf">Slides</a> ]<br>
 <i>Steve Scalpone (NVIDA / PGI / Flang)</i>
 <p>An update about the current state of Flang, including a report on OpenMP 4.5
 target offload, Fortran performance and the new f18 front end.</p>
 </td></tr>

 <tr><td valign="top" id="LightningTalk_11">
 <b>Swinging Modulo Scheduling together with Register Allocation</b>
  [ Video ]
  [ Slides ]<br>
 <i>Lama Saba (Intel)</i>
 <p>VLIW architectures rely heavily on Modulo Scheduling to optimize ILP in
 loops. Modulo Scheduling can be achieved today in LLVM using the
 MachinePipeliner pass, which implements a Swing Modulo Scheduler prior to
 register allocation [1]. For some VLIW architectures, such as those lacking
 hardware interlocks or the ability to spill registers onto a stack, the
 MachinePipeliner&#39;s decisions become crucial for the success of the register
 allocation phase, since they affect the latter&#39;s decisions to generate
 splits or spills, which in turn can result in an inefficient or even an
 unsuccessful resource allocation.</p>
 <p>Nevertheless, even though the MachinePipeliner aims to schedule with a
 minimal Initiation Interval, it is structured in a way that facilities trying
 larger Initiation Intervals or a different ordering, this structure lends
 itself to alternative, possibly less aggressive scheduling retries, after more
 aggressive attempts have failed in register allocation.</p>
 <p>This talk introduces this issue and explores how we can achieve successful
 modulo scheduling and register allocation for such architectures in LLVM by
 introducing a repetitive rollback-and-retry mechanism for altering scheduling
 decisions based on the register allocator&#39;s outcome, and how we can
 leverage such an approach to improve the scheduling of VLIW architectures in
 general.</p>
 <p>[1] An Implementation of Swing Modulo Scheduling in a Production Compiler -
 Brendon Cahoon - <a
 href="http://llvm.org/devmtg/2015-10/slides/Cahoon-SwingModuloScheduling.pdf">
   http://llvm.org/devmtg/2015-10/slides/Cahoon-SwingModuloScheduling.pdf</a></p>
 </td></tr>

 <tr><td valign="top" id="LightningTalk_12">
 <b>LLVM for the Apollo Guidance Computer</b>
  [ Video ]
  [ Slides ]<br>
 <i>Lewis Revill (University of Bath)</i>
 <p>Nearly 50 years ago on the 20th of July 1969 humans set foot on the moon for
 the first time. Among the many extraordinary engineering feats that made this
 possible was the Apollo Guidance Computer, an innovative processor for its time
 with an instruction set that was thought up well before the advent of C. So 50
 years later, why not implement support for it in a modern compiler such as
 LLVM?</p>
 <p>This talk will give a brief overview of some of the architectural features
 of the Apollo Guidance Computer followed by an account of my implementation of
 an LLVM target so far. The shortcomings of LLVM when it comes to implementing
 such an unusual architecture will be discussed along with the workarounds used
 to overcome them.</p>
 </td></tr>

 <tr><td valign="top" id="LightningTalk_13">
 <b>Catch dangling inner pointers with the Clang Static Analyzer</b>
  [ Video ]
  [ <a href="slides/Lightning-Kovacs-Dangling_pointer_checker.pdf">Slides</a> ]<br>
 <i>R&eacute;ka Kov&aacute;cs (E&ouml;tv&ouml;s Lor&auml;nd University)</i>
 <p>C++ container classes provide methods that return a raw pointer to the
 container&#39;s inner buffer. When the container is destroyed, the inner buffer
 is deallocated. A common bug is to use such a raw pointer after deallocation,
 which may lead to crashes or other unexpected behavior.</p>
 <p>This lightning talk will present a new Clang Static Analyzer checker
 designed to address the above described problems, implemented last year as a
 Google Summer of Code project. The checker has found serious problems in
 popular open source projects with a negligible false positive rate. Future
 plans include adding support for view-like constructs and non-STL
 containers.</p>
 </td></tr>

 <tr><td valign="top" id="LightningTalk_14">
 <b>Cross translation unit test case reduction</b>
  [ Video ]
  [ <a href="slides/Lightning-Kovacs-Cross_TU_reduction.pdf">Slides</a> ]<br>
 <i>R&eacute;ka Kov&aacute;cs (E&ouml;tv&ouml;s Lor&auml;nd University)</i>
 <p>C-Reduce, released by Regehr et al. in 2012, is an excellent tool designed
 to generate a minimal test case from a C/C++ file that has some specific
 property (e.g. triggers a bug). One of the most interesting parts of C-Reduce
 is Clang Delta, which is a set of compiler-like transformations implemented
 using Clang libraries. Clang Delta includes transformations like changing a
 function parameter to a global variable etc. </p>
 <p>With the introduction of the experimental cross translation unit analysis
 feature in the Clang Static Analyzer, there arose a need to investigate
 crashes, bugs, or false positive reports that spread across different
 translation units. Unfortunately, C-Reduce was designed to minimize one
 translation unit at a time, and some of the Clang Delta transformations cannot
 be applied to multiple TUs in their original form.</p>
 <p>This talk/poster is a status report about a work in progress that aims to
 make it possible to use C-Reduce for cross translation unit test case
 reduction.</p>
 </td></tr>
 </table>

 <div class="www_sectiontitle" id="BoF">BoFs</div>
 <table cellpadding="10">

 <tr><td valign="top" id="BoF_1">
 <b>RFC: Towards Vector Predication in LLVM IR</b><br>
 <i>Simon Moll (Saarland University), Sebastian Hack (Saarland University)</i>
 <p>In this talk, we present the current state of the Explicit Vector Length
 extension for LLVM. EVL is the first step towards proper predication and active
 vector length support in LLVM IR. There has been a recent surge in vector ISAs,
 let it be the RISC-V V extension, ARM SVE or NEC SX-Aurora, all of which pose
 new demands to LLVM IR. Among their novel features are an active vector length,
 full predication on all vector instructions and a register length that is
 unknown at compile time. In this talk, we present the Explicit Vector Length
 extension (EVL) for LLVM IR. EVL provides primitives that are practical for
 both, backends and IR-level automatic vectorizers. At the same time, EVL is
 compatible with LLVM-SVE and even existing short SIMD ISAs stand to benefit
 from its consistent handling of predication.</p>
 </td></tr>

 <tr><td valign="top" id="BoF_2">
 <b>IPO --- Where are we, where do we want to go?</b><br>
 <i>Johannes Doerfert (Argonne National Laboratory), Kit Barton (IBM Toronto Lab)</i>
 <p>Interprocedural optimizations (IPOs) have been historically weak in LLVM.
 The strong reliance on inlining can be seen as a consequence or cause. Since
 inlining is not always possible (parallel programs) or beneficial (large
 functions), the effort to improve IPO has recently seen an upswing again
 [0,1,2]. In order to capitalize this momentum, we would like to talk about the
 current situation in LLVM, and goals for the immediate, but also distant,
 future.</p>
 <p>This open-ended discussion is not aimed at a particular group of people. We
 expect to discuss potential problems with IPO, as well as desirable analyses
 and optimizations, both experts and newcomers are welcome to attend.</p>
 <p>[0] <a
   href="https://lists.llvm.org/pipermail/llvm-dev/2018-August/125537.html">
   https://lists.llvm.org/pipermail/llvm-dev/2018-August/125537.html</a></p>
 <p>[1] These links do not yet exist but will be added later on.</p>
 <p>[2] One link will be an RFC outlining missing IPO capabilities, the other
 will point to a function attribute deduction rewrite patch (almost
 finished).</p>
 </td></tr>

 <tr><td valign="top" id="BoF_3">
 <b>LLVM binutils</b>
 [ <a href="http://lists.llvm.org/pipermail/llvm-dev/2019-April/132032.html">Notes</a> ]
 [ <a href="slides/BoF-HendersonRupprecht-LLVM_binutils.pdf">Slides</a> ]<br>
 <i>James Henderson (SN Systems), Jordan Rupprecht (Google)</i>
 <p>LLVM has a suite of binary utilities that broadly mirror the GNU binutils
 suite, with tools such as llvm-readelf, llvm-nm, and llvm-objcopy. These tools
 are already widely used in testing the rest of LLVM, and are now starting to be
 adopted as full replacements for the GNU tools in production environments.</p>
 <p>This discussion will focus on what more needs to be done to make this
 migration process easier, how far we need to go to make drop-in replacements
 for the GNU tools, and what features people want to prioritize. Finally, we
 will look at the broader future goals of these tools.</p>

 <tr><td valign="top" id="BoF_4">
 <b>RFC: Reference OpenCL Runtime library for LLVM</b><br>
 <i>Andrew Savonichev (Intel), Alexey Sachkov (Intel)</i>
 <p>LLVM is used as a foundation for majority of OpenCL compilers, thanks to
 excellent support of OpenCL C language in Clang frontend, and modularity of
 LLVM. Unfortunately, a compiler is not the only component that is required to
 develop using OpenCL: users need a runtime library that implements the OpenCL
 API. While there are several implementations of OpenCL runtime exist, both open
 and proprietary, they do not have a community wide adoption. This leads to
 fragmentation and effort duplication across OpenCL community, and negatively
 impacts OpenCL ecosystem in general.</p>
 <p>The purpose of this BoF is to bring all parties interested in getting a
 reference OpenCL Runtime implementation in LLVM, that is designed to be easily
 extendable to support various accelerator devices (CPU/GPU/FPGA/DSP) and allow
 users and compiler developers to rapidly prototype OpenCL specific
 functionality in LLVM and Clang.</p>
 </td></tr>

 <tr><td valign="top" id="BoF_5">
 <b>LLVM Interface Stability Guarantees BoF</b><br>
 <i>Stephen Kelly</i>
 <p>The goal of this BoF is to create the basis for a new page of documentation
 enumerating the stability guarantees of interfaces exposed from LLVM
 products.</p>
 <p>There are some interfaces which are known to make no stability guarantees,
 such as the Clang C++ API, others which make strict API guarantees, such as the
 libclang C API, and still others, such as the LLVM IR API which is somewhere in
 between. Only the latter appears in the LLVM Developer Policy. Mostly the rest
 of the interface stability guarantees are tribal knowledge.</p>
 <p>A centralized location in the documentation for this documentation would
 present guidelines for developers to follow when changing various parts of LLVM
 code, and inform consumers what they can expect and rely upon when using
 interfaces. This includes code interfaces and command line interfaces.</p>
 </td></tr>

 <tr><td valign="top" id="BoF_6">
 <b>Clang Static Analyzer BoF</b><br>
 <i>Devin Coughlin (Apple), Gabor Horvath (Eotvos Lorand University)</i>
 <p>Let&#39;s discuss the present and future of the Clang Static Analyzer!
 We&#39;ll start with a brief overview of analyzer features the community has
 added over the last year. We&#39;ll then dive into a discussion of possible
 focus areas for the next year, including potential deeper integration with
 clang-tidy.</p>
 </td></tr>

 <tr><td valign="top" id="BoF_7">
 <b>LLVM Numerics Improvements</b><br>
 <i>Michael Berg (Apple), Steve Canon (Apple)</i>
 <p>Some LLVM based compilers currently provide two modes of floating point code
 generation. The first mode, called fast-math, is where performance is the
 primary consideration over numerical precision and accuracy. This mode does not
 strictly follow the IEEE-754 standard, but has proven useful for applications
 that do not require this level of precision. The second mode, called
 precise-math, is where the compiler carefully follows the subset of behavior
 defined in the IEEE standard that is applicable to conforming hardware targets.
 This mode is primarily used for compute workloads and wherever fast-math
 precision is inadequate, however it runs much slower as it requires a larger
 number of instructions in general. In practice neither of these modes is
 particularly desirable. The fast-math mode ignores a significant portion of the
 standard as pertains to handling undefined values described as Not a Number
 (NaNs) and Infinities (INFs), resulting in difficulties for certain workloads
 when the hardware target computes these values correctly and performance
 remains critical.</p>
 <p>Until recently these two models were mutually exclusive, however with the
 addition of IR flags they need not be.  For instance, the FastMath metadata
 module flag drives behavior deemed numerically unsafe when it is enabled, by
 indiscriminately enabling optimizations. With IR flags this behavior can be
 enabled with much finer granularity, allowing various code forms to be fast or
 precise together in one module.  We call this mixed mode compilation.  IR flags
 can be used individually or paired to produce desired floating point behavior
 under specified constraints with fine granularity of control.  Optimization
 passes have been modified under this new kind of control to produce this
 behavior.  This talk will describe the recent numerics work and discuss the
 implications for front-ends and backends built with LLVM.</p>
 </td></tr>

 <tr><td valign="top" id="BoF_8">
 <b>LLVM Foundation BoF</b><br>
 <i>LLVM Foundation Board of Directors</i>
 <p>Ask the LLVM Foundation Board of Directors anything, get program updates.</p>
 </td></tr>
 </table>

 <div class="www_sectiontitle" id="Poster">Posters</div>
 <table cellpadding="10">

 <tr><td valign="top" id="Poster_1">
 <b>Clava: C/C++ source-to-source from CMake using LARA</b>
  [ <a href="slides/Poster-Bispo-Clava_CC++_source_to_source_from_CMake_using_LARA.pdf">Poster</a> ]<br>
 <i>Jo&atilde;o Bispo (FEUP/INESCTEC)</i>
 <p>Clava is a Clang-based source-to-source compiler that executes scripts
 written in LARA, a superset of JavaScript with additional syntax for AST
 analysis and transformation.</p>
 <p>Clava intends to improve on Clang&#39;s source-to-source capabilities, by
 providing a more convenient and powerful way to analyze, transform and generate
 C/C++ code.</p>
 <p>Although Clava is a stand-alone tool, we will present the Clava CMake
 plug-in, which allows to easily apply LARA scripts to C/C++ CMake projects.
 Clava is open-source and runs on Linux, Windows and MacOS.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_2">
 <b>Safely Optimizing Casts between Pointers and Integers</b>
  [ <a href="slides/Poster-Lee-Safely_optimizing_casts_between_pointers_and_integers.pdf">Poster</a> ]<br>
 <i>Juneyoung Lee (Seoul National University, Korea), Chung-Kil Hur (Seoul
   National University, Korea), Ralf Jung (MPI-SWS, Germany), Zhengyang Liu
   (University of Utah, USA), John Regehr (University of Utah, USA), Nuno P.
 Lopes (Microsoft Research, UK)</i>
 <p>In this talk, a list of optimizations that soundly removes casts between
 pointers and integers will be presented. In LLVM, a pointer is more than just
 an integer: LLVM allows a pointer to track its underlying object, and the rule
 to find it is defined as based-on relation. This allows LLVM to aggressively
 optimize load/stores, but makes the meaning of pointer-integer casts
 complicated. This causes conflict between existing optimizations, causing
 long-standing miscompilation bugs like 34548.</p>
 <p>To fix it, we suggest disabling folding of inttoptr(ptrtoint(p)) to p and
 using a safe workaround to remove them. This optimization is important because
 it&#39;s removing a significant portion of such cast pairs. We&#39;ll show that
 even if the optimization is disabled, majority of casts can be removed by
 carefully adding new \&amp; modifying existing optimizations. After the
 updates, the performance is still comparable to the original LLVM.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_3">
 <b>Scalar Evolution Canon: Click! Canonicalize SCEV and validate it by Z3 SMT solver!</b>
  [ Poster ]<br>
 <i>Lin-Ya Yu (Xilinx), Alexandre Isoard (Xilinx)</i>
 <p>A scalar evolution(SCEV) is an analyzed expression. It represents how the
 value of scalar variables changes in a program when we execute the code[0]. It
 is implemented as a pass and is well-used in many analysis and optimizations in
 LLVM, such as loop strength reduction, induction variable substitution, and
 memory access analysis. However, it is difficult to have a canonical form for
 SCEV that can meet all other passes needs. Here, we develop SCEV Canon to do
 canonicalization and further simplification on SCEV.</p>
 <p>A satisfiability modulo theories(SMT) solver from Microsoft Research, Z3, is
 introduced in this work to verify the correctness of canonicalized SCEV.
 Moreover, Z3 can also help us check the equivalence of SCEVs between different
 SCEV implementation in different released of LLVM. This poster shares the whole
 process of how to canonicalize SCEV without modifying the scalar evolution
 pass, verify and test the generated SCEV. We also try to open a discussion
 about some simplification that can be done on SCEV.</p>
 <p>[0] <a
   href="https://subscription.packtpub.com/book/application_development/9781785280801/5/ch05lvl1sec36/scalar-evolution">
   https://subscription.packtpub.com/book/application_development/9781785280801/5/ch05lvl1sec36/scalar-evolution</a></p>
 </td></tr>

 <tr><td valign="top" id="Poster_4">
 <b>Splendid GVN: Partial Redundancy Elimination for Algebraic Simplification</b>
  [ <a href="slides/Poster-Her-SplendidGVN_Partial_redundancy_elimination.pdf">Poster</a> ]<br>
 <i>Li-An Her (National Tsing Hua University), Jenq-Kuen Lee (National Tsing Hua University)</i>
 <p>Modern computation of Neural Network, signal processing of GPS and Wifi,
 image processing, etc, highly depends on enormous linear algebra operations.
 Algebraic simplification improves performance for more and more complicated
 computation such as convolutions for CNN and Sobel operator, inner products for
 discrete cosine transform and FFT of signal processing, etc. LLVM IR provides
 several passes of optimization for algebraic simplification, constant folding,
 copy propagation, etc. One is global value numbering (GVN). These passes work
 fine except encountering branches and non-local cases. One case is partial
 redundancy elimination (PRE). At least two instructions are redundant or
 congruent, but they are in different blocks. Even though elimination of one
 redundant won&#39;t lead to logic error, compiler lacks such rule and ignores
 such elimination. Thus, algebraic simplification fails to optimize code when
 PRE occurs. GVN provides PRE mechanism with lazy code motion, but it cannot
 provide more accurate congruence information due to loops and &Phi;-nodes. New
 GVN handles such case and provides more delicate congruence information, but it
 lacks mechanism for and ignores PRE.</p>
 <p>In this paper, we propose Splendid GVN which inserts PRE mechanism for New
 GVN on LLVM 7.0.0. When PRE happens, our pass checks safety and applies hoist
 code motion to eliminate partial redundancy. Original GVN applies less accurate
 algorithm and can only perform lazy code motion, which takes risk for
 increasing code size. Original Hoist GVN cannot handle PRE and utilizes GVN
 instead of New GVN, which cannot provide more delicate information due to loops
 and may miss opportunity for further elimination. Experiments show that our
 Splendid GVN performs hoist code motion for PRE on 2 qualified PRE programs
 from LLVM test directory for GVN (available in source code). Splendid GVN
 reduces total code size with -18.37% and -7% compared to original 2 programs
 and New GVN results.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_5">
 <b>An alternative OpenMP Backend for Polly</b>
  [ <a href="slides/Poster-Halkenhauser-An_alternative_OpenMP_backend_for_Polly.pdf">Poster</a> ]<br>
 <i>Michael Halkenh&auml;user (TU Darmstadt)</i>
 <p>LLVM&#39;s polyhedral infrastructure framework Polly may automatically
 exploit thread-level parallelism through OpenMP. Currently, the user can only
 influence the number of utilized threads, while other OpenMP parameters such as
 the scheduling type and chunk size are set to fixed values. This in turn,
 limits a user&#39;s ability to adapt the optimization process for a given
 problem.</p>
 <p>In this work, we present an alternative OpenMP backend for Polly, which
 provides additional customization options to the user and is based on the LLVM
 OpenMP runtime. We evaluate our new backend and the influence of the new
 customization options on performance and compare to Polly&#39;s existing OpenMP
 backend.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_6">
 <b>Does the win32 clang compiler executable really need to be over 21MB in size?</b>
  [ <a href="slides/Poster-Gallop-Does_the_win32_clang_compiler_reaaly_nee_to_be_over_21MB_in_size.png">Poster</a> ]<br>
 <i>Russell Gallop (SN System), Greg Bedwell (SN Systems)</i>
 <p>The title of this lighting talk is from a bug filed in the early days of the
 PS4 compiler. It noted that the LLVM-based PS4 compiler was more than 3 times
 larger than the PS3 compiler. Since then it has almost doubled to over 40MB.
 For a compiler which targets one system this seems excessive. Executable size
 can cost in worse cache performance and cost time if transferring for
 distributed builds.</p>
 <p>In this lightning talk I will look at where this comes from and how it can
 be managed.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_7">
 <b>Enabling Multi- and Cross-Language Verification with LLVM</b>
  [ <a href="slides/Poster-Garzella-Enabling_multi_and_cross_language_verification_with_LLVM.pdf">Poster</a> ]<br>
 <i>Jack J. Garzella (University of Utah), Marek Baranowski (University of Utah),
 Shaobo He (University of Utah), Zvonimir Rakamaric (University of Utah)</i>
 <p>Developers nowadays regularly use numerous programming languages with
 different characteristics and trade-offs. Unfortunately, implementing a
 software verifier for a new language from scratch is a large and tedious
 undertake, requiring expert knowledge in multiple domains, such as compilers,
 verification, and constraint solving. Hence, only a tiny fraction of the used
 languages has readily available software verifiers to aid in the development of
 correct programs. In the past decade, there has been a trend of leveraging
 popular compiler intermediate representations (IRs), such as LLVM IR, when
 implementing software verifiers. The main advantage is to avoid implementing
 large front-ends, and instead rely on a typically simple canonical format of an
 IR. In addition, processing IR promises out-of-the-box multi- and
 cross-language verification since, at least in theory, a verifier ought to be
 able to handle a program in any programming language (and their combination)
 that can be compiled into the IR. In practice though, to the best of our
 knowledge, nobody has explored the feasibility and ease of such integration of
 new languages. This talk introduces a methodology for adding support for a new
 language into an IR-based verification toolflow. Using our methodology, we
 extend an existing verifier called SMACK with support for 7 additional
 languages. We assess the quality of our extensions and the proposed methodology
 through several case studies, and we describe the lessons we learned in the
 process.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_8">
 <b>Instruction Tracing and dynamic codegen analysis to identify unique llvm performance issues.</b>
  [ <a href="slides/Poster-Biplop-Tracing.pdf">Poster</a> ]<br>
 <i>Biplob (IBM)</i>
 <p>Performance analysis of the machine code generated by a compiler can be
 carried out in different ways and can also be based on application in question.
 Common methods use some form of profiling on a running program which generally
 provides the statistical information about certain data and events. While this
 method does give important insights to a performance problem, some of the
 issues are more clearly understood when the compiled applications is actually
 run and the dynamic instructions of hot code execution paths are traced and
 analyzed in a small execution window. Trace records contain instructions and
 data, memory addresses and other information which provide complete visibility
 into the workings of an application.</p>
 <p>While tracing is very useful in micro-architecture analysis we will stick to
 how these traces can benefit compiler performance analysis. In this talk we
 will look at some of these code-gen issues which were better identified when a
 running application compiled by llvm and other compilers were traced for hot
 code sections on IBM Power9 processor.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_9">
 <b>Handling all Facebook requests with JITed C++ code</b>
  [ <a href="slides/Poster-Zhou-Handling_all_Facebook_requests_with_JITed_C++_code.pdf">Poster</a> ]<br>
 <i>Huapeng Zhou (Facebook), Yuhan Guo (Facebook)</i>
 <p>Facebook needs an efficient scripting framework to enable fast iteration of
 HTTP request handling logic in our L7 reverse proxy. A C++ scripting engine and
 code deployment ecosystem was created to compile/link/execute C++ script at
 run-time, using Clang and LLVM ORC APIs. The framework allows developers to
 write business logic and unit test in C++ script, as well as debug using GDB.
 Profiling using perf is also supported for PGO purpose. This new framework
 outperformed another previously used scripting language by up to 4X, measured
 in execution time.</p>
 <p>In order to power the C++ script in ABI compatible way, a PCH (pre-compiled
 header) is built statically to provide declarations and definitions of
 necessary dependent types and methods. Clang APIs are then used at run-time to
 transform source code to LLVM IR, which are later passed through LLVM ORC
 layers for linking/optimizing. Above Clang/LLVM toolchains are statically
 linked into main binary to ensure compatibility between PCH and C++ scripts. As
 a result, scripts could be deployed in real time without any main binary
 change.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_10">
 <b>Implementing SPMD control flow in LLVM using reconverging CFGs</b>
  [ <a href="slides/Poster-Wahlster-Implementing_SPMD_control_flow_in_LLVM_using_reconverging_CFG.pdf">Poster</a> ]<br>
 <i>Fabian Wahlster (Technische Universit&auml;t M&uuml;nchen), Nicolai H&auml;hnle (Advanced Micro Devices)</i>
 <p>Compiling programs for an SPMD execution model, e.g. for GPUs or for whole
 program vectorization on CPUs, requires a transform from the thread-level input
 program into a vectorized wave-level program in which the values of the
 original threads are stored in corresponding lanes of vectors.  The main
 challenge of this transform is handling divergent control flow, where threads
 take different paths through the original CFG.  A common approach, which is
 currently taken by the AMDGPU backend in LLVM, is to first structurize the
 program as a simplification for subsequent steps.</p>
 <p>However, structurization is overly conservative.  It can be avoided when
 control flow is uniform, i.e. not divergent.  Even where control flow is
 divergent, structurization is often unnecessary.  Moreover, LLVM&#39;s
 StructurizeCFG pass relies on region analysis, which limits the extent to which
 it can be evolved.</p>
 <p>We propose a new approach to SPMD vectorization based on saying that a CFG
 is reconverging if for every divergent branch, one of the successors is a
 post-dominator.  This property is weaker than structuredness, and we show that
 it can be achieved while preserving uniform branches and inserting fewer new
 basic blocks than structurization requires.  It is also sufficient for code
 generation, because it guarantees that threads which &quot;leave&quot; a wave
 at divergent branches will be able to rejoin it later.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_11">
 <b>LLVM for the Apollo Guidance Computer</b>
  [ Poster ]<br>
 <i>Lewis Revill (University of Bath)</i>
 <p>Nearly 50 years ago on the 20th of July 1969 humans set foot on the moon for
 the first time. Among the many extraordinary engineering feats that made this
 possible was the Apollo Guidance Computer, an innovative processor for its time
 with an instruction set that was thought up well before the advent of C. So 50
 years later, why not implement support for it in a modern compiler such as
 LLVM?</p>
 <p>This talk will give a brief overview of some of the architectural features
 of the Apollo Guidance Computer followed by an account of my implementation of
 an LLVM target so far. The shortcomings of LLVM when it comes to implementing
 such an unusual architecture will be discussed along with the workarounds used
 to overcome them.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_12">
 <b>LLVM Miner: Text Analytics based Static Knowledge Extractor</b>
  [ <a href="slides/Poster-Hameeza-LLVM_Miner_Text_Analytics_based_Static_Knowledge_Extractor.pdf">Poster</a> ]<br>
 <i>Hameeza Ahmed (NED University of Engineering and Technology), Muhammad Ali
   Ismail (NED University of Engineering and Technology)</i>
 <p>Compiler converts high level language code into assembly language by
 enabling optimizations. There are three phases in compiler namely front end,
 middle end and backend. Low Level Virtual Machine (LLVM) is an open source
 framework enabling provision of all these three stages. One of the reasons of
 huge adoption of LLVM is its powerful optimizer or middle end stage. There
 exist various opportunities to optimize given Intermediate Representation (IR)
 code generated by front end. Before applying any optimization significant
 efforts are dedicated for detailed analysis of given IR in order to extract
 static information hidden in source code.</p>
 <p>Up till now, there exists a standard mechanism to analyze IR code by using
 analysis passes written in LLVM itself. Each time some information is required
 from IR, a pass is written or reused in LLVM core syntax. This approach is
 proved to be complex for novice programmers who are unfamiliar with the LLVM
 coding style having hard core C++ concepts. This way a significant amount of
 time is spent on learning LLVM programming than doing the required compile time
 code analysis. In this regard, an easier mechanism is needed to perform static
 code analysis in LLVM.</p>
 <p>In this work, LLVM miner is presented to simplify static IR level analysis
 in LLVM compiler tool. LLVM miner performs text analytics in order to extract
 related information from given IR code. The IR generated from front end is
 passed through the proposed miner where static hidden features are extracted
 easily. The proposed approach has been tested using set of 5 mixed benchmark
 codes namely bfs, connected components, grep, histogram, and kmeans. The
 experiments are conducted using R script for determining the instruction
 frequency and application trend. Instruction frequency shows count of each
 instruction in given IR code. It is represented by means of bar graph and word
 cloud. Then application trend is obtained by clustering individual instructions
 in certain categories such as branch, compute, function calls, IO read write,
 memory consumption, and memory read write operations of each instruction.
 Application trend shows proportion of different classes of operations in a
 given code using bar graphs. It enables us to know whether application is
 compute bound, or memory bound or I/O bound etc by using static code level
 features. The analysis of LLVM IR using text mining techniques appears to be a
 promising direction towards studying significant features hidden in source
 code. The text analytics of given IR is expected to be an easier and less
 costly solution both in terms of time and efforts, as compared to the
 conventional LLVM analysis passes.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_13">
 <b>Function Merging by Sequence Alignment</b>
  [ <a href="slides/Poster-Rocha-Function_merging_by_sequence_alignment.pdf">Poster</a> ]<br>
 <i>Rodrigo Rocha (University of Edinburgh), Pavlos Petoumenos (University of Edinburgh), Zheng Wang (Lancaster University), Murray Cole (University of Edinburgh), Hugh Leather (University of Edinburgh)</i>
 <p>Resource-constrained devices for embedded systems are becoming increasingly
 important. In such systems, memory is highly restrictive, making code size in
 most cases even more important than performance. Compared to more traditional
 platforms, memory is a larger part of the cost and code occupies much of it.
 Despite that, compilers make little effort to reduce code size. One key
 technique attempts to merge the bodies of similar functions. However,
 production compilers only apply this optimization to identical functions, while
 research compilers improve on that by merging the few functions with identical
 control-flow graphs and signatures. Overall, existing solutions are
 insufficient and we end up having to either increase cost by adding more memory
 or remove functionality from programs.</p>
 <p>We introduce a novel technique that can merge arbitrary functions through
 sequence alignment, a bioinformatics algo- rithm for identifying regions of
 similarity between sequences. We combine this technique with an intelligent
 exploration mechanism to direct the search towards the most promising function
 pairs.  Our approach is more than 2.4x better than the state-of-the-art,
 reducing code size by up to 25%, with an overall average of 6%, while
 introducing an average compilation-time overhead of only 15%. When aided by
 profiling information, this optimization can be deployed without any
 significant impact on the performance of the generated code.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_14">
 <b>Compilation and optimization with security annotations</b>
  [ <a href="slides/Poster-TuanVu-Compilation_and_optimization_with_security_annotations.pdf">Poster</a> ]<br>
 <i>Son Tuan Vu (LIP6), Karine Heydemann (LIP6), Arnaud de Grandmaison (ARM),
   Albert Cohen (Google)</i>
 <p>Program analysis and program transformation systems need to express
 additional program properties, to specify test and verification goals, and to
 enhance their effectiveness. Such annotations are typically inserted to the
 representation on which the tool operates; e.g., source level for establishing
 compliance with a specification, and binary level for the validation of secure
 code. While several annotation languages have been proposed, these typically
 target the expression of functional properties. For the purpose of implementing
 secure code, there has been little effort to support non-functional properties
 about side-channels or faults. Furthermore, analyses and transformations making
 use of such annotations may target different representations encountered along
 the compilation flow.</p>
 <p>We extend an annotation language to express a wider range of functional and
 non-functional properties, enabling security-oriented analyses and influencing
 the application of code transformations along the compilation flow. We
 translate this language to the different compiler representations from abstract
 syntax down to binary code. We explore these concepts through the design and
 implementation of an optimizing, annotation-aware compiler, capturing
 annotations from the program source, propagating and emitting them in the
 binary, so that binary-level analysis tools can use them.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_16">
 <b>Leveraging Polyhedral Compilation in Chapel Compiler</b>
  [ <a href="slides/Poster-Yerawar-ChapelPolly.pdf">Poster</a> ]<br>
 <i>Sahil Yerawar (IIT Hyderabad), Siddharth Bhat (IIIT Hyderabad), Michael Ferguson (Cray Inc.), Philip Pfaffe (Karlsruhe Institute of Technology), Ramakrishna Upadrasta (IIT Hyderabad)</i>
 <p>Chapel is an emerging parallel programming language developed with the aim
 of providing better performance in High-Performance Computing as well as
 accessibility to the newcomer programmers. It relies on LLVM as one of its
 backends. This talk shows how the polyhedral compilation techniques available
 in Polly are utilized by the Chapel Compiler. We will share our experience of
 using Polly&#39;s Loop Optimizer in a new setting with Polly &amp; LLVM
 Developers.In particular, the talk will discuss how the Chapel compiler can
 benefit from the optimizations available in Polly including GPGPU code
 generation.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_17">
 <b>LLVM on AVR - textual IR as a powerful tool for making &quot;impossible&quot; compilers</b>
  [ Poster ]<br>
 <i>Carl Peto (Swift for Arduino/Petosoft)</i>
 <p>To be demonstrated on stage and available to use and test, I have built a
 prototype compiler for the a subset of the Swift language onto the Arduino UNO
 platform, which is a radically different use for the language. Despite the
 Swift compiler and front end having limited support for such a different back
 end.</p>
 <p>Key to the success was separation of the first part of the compilation into
 textual LLVM IR (using a standard toolchain), followed by compilation from LLVM
 IR files into machine code using a custom built llc. This approach improves
 debugging, especially of deployed product, and separation of concerns.
 Ultimately it could be used as a template for other &quot;impossible&quot;
 compilers such as Swift to WebAssembly, Go to OpenGL shaders and more.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_18">
 <b>Vectorizing Add/Sub Expressions with SLP</b>
  [ <a href="slides/Poster-Porpodas-Supernode_SLP.pdf">Poster</a> ]<br>
 <i>Vasileios Porpodas (Intel Corporation, USA), Rodrigo C. O. Rocha (University
   of Edinburgh, UK), Evgueni Brevnov (Intel Corporation, USA), Lu&iacute;s F.
   W. G&oacute;es (PUC Minas, Brazil), Timothy Mattson (Intel Corporation,
 USA)</i>
 <p>The SLP Vectorizer is LLVM&#39;s second vectorizer (after the Loop
 Vectorizer). It performs auto-vectorization of straight-line code. It works by
 first exploring the scalar code for vectorizable patterns (groups), and then by
 replacing each group with its vectorized form. </p>
 <p>This talk presents the existing design of the SLP vectorizer and shows how
 it fails to vectorize simple IR inputs with Add/Sub (or Mul/Div) expression
 trees. We propose specific improvements to the current design that will let us
 effectively handle such code. We named this design SuperNode SLP (SN-SLP)
 because it extends the SLP graph to include new &quot;fat&quot; nodes that
 include multiple instructions. This talk also presents our detailed plan for
 upstreaming the bulk of this work in a sequence of patches.</p>
 </td></tr>

 <tr><td valign="top" id="Poster_19">
 <b>Adding support for C++ contracts to Clang</b>
  [ <a href="slides/Poster-Lopez-Gomez-Adding_support_for_C++_contracts_to_Clang.pdf">Poster</a> ]<br>
 <i>Javier L&oacute;pez-G&oacute;mez (University Carlos III of Madrid), J.
   Daniel Garc&iacute;a (University Carlos III of Madrid)</i>
 <p>A language supporting contract-checking allows to detect programming errors.
 Also, making this information available to the compiler may cause it to perform
 additional optimizations.</p>
 <p>This paper presents our implementation of the P0542R5 technical
 specification (now part of the C++20 working draft).</p>
 </td></tr>

 <tr><td valign="top" id="Poster_20">
 <b>Optimizing Nondeterminacy: Exploiting Race Conditions in Parallel Programs</b>
  [ Poster ]<br>
 <i>William S. Moses (MIT CSAIL)</i>
 <p>As computation moves towards parallel programming models, writing efficient
 parallel programs becomes paramount. As a result, there have been several
 efforts (Tapir, HPVM, among others) to augment serial compilers such as LLVM to
 have a first-class representation of parallelism. Such representations
 theoretically permit the compiler to both analyze and optimize parallel
 programs.</p>
 <p>A major difference between serial and parallel programs is that in many
 parallel runtimes, one cannot make any assumptions about the ordering of
 various logical tasks. This nondeterminism creates an opportunity for the
 compiler. Since any ordering is valid, the compiler can also reorder tasks if
 it believes it beneficial.</p>
 <p>This talk will discuss how the compiler can take advantage of this
 nondeterminacy through a number of example optimizations, taking a look at
 their theoretical implications as well as how they perform when implemented
 atop the Tapir extension to LLVM.</p>
 </td></tr>
 </table>

 <!-- *********************************************************************** -->

 <!--#include virtual="sponsors.incl" -->

 <hr>

 <!--#include virtual="../../footer.incl" -->