OpenProjects.html - llvm-www - Git at Google

 <!--#include virtual="header.incl" -->

 <div class="www_sectiontitle">Open LLVM Projects</div>

 <ul>
   <li>Google Summer of Code Ideas & Projects
     <ul>
       <li>
         <a href="#gsoc24">Google Summer of Code 2024</a>
         <ul>
           <li><b>LLVM Core</b>
             <ul>
               <li><a href="#remove_ub_tests">Remove undefined behavior from tests</a></li>
               <li><a href="#spirv_tablegen">Automatically generate TableGen file for SPIR-V instruction set</a></li>
               <li><a href="#bitstream_cas">LLVM bitstream integration with CAS (content-addressable storage)</a></li>
               <li><a href="#three_way_comparison">Add 3-way comparison intrinsics</a></li>
               <li><a href="#llvm_www">Improve the LLVM.org Website Look and Feel</a></li>
               <li><a href="#parameter-tuning">The 1001 thresholds in LLVM</a></li>
            </ul>
           <li><a href="http://clang.llvm.org/"><b>Clang</b></a>
             <ul>
               <li><a href="#clang-repl-out-of-process">Out-of-process execution for clang-repl</a></li>
               <li><a href="#clang-plugins-windows">Support clang plugins on Windows</a></li>
               <li><a href="#clang-on-demand-parsing">On Demand Parsing in Clang</a></li>
               <li><a href="#clang-doc-improve-usability">Improve Clang-Doc Usability</a></li>
             </ul>
           <li><a href="http://lldb.llvm.org/"><b>LLDB</b></a>
             <ul>
               <li><a href="#rich-disassembler-for-lldb">Rich disassembler for LLDB</a></li>
             </ul>
           <li><a href="http://openmp.llvm.org/"><b>(OpenMP) Offload</b></a>
             <ul>
               <li><a href="#gpu-delta-debugging">GPU Delta Debugging</a></li>
               <li><a href="#offload-libcxx">Offloading libcxx</a></li>
               <li><a href="#gpu-libc">Performance tuning the GPU libc</a></li>
               <li><a href="#gpu-first">Improve GPU First Framework</a></li>
             </ul>
           <li><a href="https://clangir.org"><b>ClangIR</b></a>
             <ul>
               <li><a href="#clangir-gpu">Compile GPU kernels using ClangIR</a></li>
             </ul>
           <li><a href="http://libc.llvm.org/"><b>LLVM libc</b></a>
             <ul>
               <li><a href="#half-precision-libc">Half precision in LLVM libc</a>
             </ul>
           </ul>

       </li>

       <li>
         <a href="#gsoc23">Google Summer of Code 2023</a>
         <ul>
           <li>
             <b>LLVM Core</b>
             <ul>
                <li><a href="#llvm_new_jitlink_reopt">Re-optimization using JITLink</a></li>
                <li><a href="#llvm_new_jitlink_backends">JITLink new backends</a></li>
                <li><a href="#llvm_improving_compile_times">Improving compile times</a></li>
                <li><a href="#llvm_addressing_rust_optimization_failures">Addressing Rust optimization failures</a></li>
                <li><a href="#llvm_mlgo_latency_model">Better performance models for MLGO training</a></li>
                <li><a href="#llvm_mlgo_passes_2023">Machine Learning Guided Ordering of Compiler Optimization Passes</a></li>
                <li><a href="#llvm_map_value_to_src_expr">Map LLVM values to corresponding source-level expressions</a></li>
             </ul>
           </li>
           <li><a href="http://clang.llvm.org/"><b>Clang</b></a>
             <ul>
               <li><a href="#clang-repl-out-of-process">Out-of-process execution for clang-repl</a>
               <li><a href="#clang_analyzer_taint_analysis">Improve and Stabilize the Clang Static Analyzer's "Taint Analysis" Checks</a></li>
               <li><a href="#clang-repl-autocompletion">Implement autocompletion in clang-repl</a>
               <li><a href="#clang-modules-build-daemon">Modules build daemon: build system agnostic support for explicitly built modules</a></li>
               <li><a href="#clang-extract-api-categories">ExtractAPI Objective-C categories</a></li>
               <li><a href="#clang-extract-api-cpp-support">ExtractAPI C++ Support</a></li>
               <li><a href="#clang-extract-api-while-building">ExtractAPI while building</a></li>
               <li><a href="#clang-improve-diagnostics2">Improve Clang diagnostics</a></li>
               <li><a href="#clang-tutorials-clang-repl">Tutorial development with clang-repl</a></li>
               <li><a href="#clang-repl-wasm">Add WebAssembly Support in clang-repl</a></li>
               </li>
             </ul>
           </li>

           <li>
             <b>LLD</b>
             <ul>
               <li><a href="#llvm_lld_embedded">LLD Linker Improvements for Embedded Targets</a></li>
             </ul>
           </li>

           <li>
             <b>MLIR</b>
             <ul>
               <li><a href="#llvm_mlir_presburger_opt">Optimizing MLIR’s Presburger library</a></li>
               <li><a href="#llvm_mlir_query">Interactively query MLIR IR</a></li>
             </ul>
           </li>

           <li>
             <b>Code Coverage</b>
             <ul>
               <li><a href="#llvm_code_coverage">Support a hierarchical directory structure in generated coverage html reports</a></li>
               <li><a href="#llvm_patch_coverage">Patch based test coverage for quick test feedback</a></li>
             </ul>
           </li>

           <li>
             <b>ClangIR</b>
             <ul>
               <li><a href="#clangir">Build and run SingleSource benchmarks using ClangIR</a></li>
             </ul>
           </li>

           <li>
             <b><a href="https://enzyme.mit.edu">Enzyme</a></b>
             <ul>
               <li><a href="#enzyme_tblgen_extension">Move additional Enzyme Rules to Tablegen</a></li>
             </ul>
           </li>

         </ul>
       </li>

       <li>
         <a href="#gsoc22">Google Summer of Code 2022</a>
         <ul>
           <li>
             <b>LLVM Core</b>
             <ul>
               <li><a href="#llvm_shared_jitlink">Implement a shared-memory based JITLinkMemoryManager for out-of-process JITting</a></li>
               <li><a href="#llvm_build_jit_tutorial">Modernize the LLVM "Building A JIT" tutorial series</a></li>
               <li><a href="#llvm_jit_new_format">Write JITLink support for a new format/architecture</a></li>
               <li><a href="#llvm_instrumentaion_for_compile_time">Instrumentation of Clang/LLVM for Compile Time</a></li>
               <li><a href="#llvm_lto_dependency_info">Richer symbol dependency information for LTO</a></li>
               <li><a href="#llvm_mlgo_passes">Machine Learning Guided Ordering of Compiler Optimization Passes</a></li>
               <li><a href="#llvm_mlgo_loop">Learning Loop Transformation Heuristics</a></li>
               <li><a href="#llvm_module_inliner">Evaluate and Expand the Module-Level Inliner</a></li>
               <li><a href="#llvm_undef_load">Remove undef: move uninitialized memory to poison</a></li>
               <li><a href="#llvm_abi_export">Add ABI/API export annotations to the LLVM build</a></li>
             </ul>
           </li>

           <li><a href="http://clang.llvm.org/"><b>Clang</b></a>
             <ul>
               <li><a href="#clang-template-instantiation-sugar">Extend clang AST to
                 provide information for the type as written in template
                 instantiations</a>
               </li>
               <li><a href="#clang-sa-structured-bindings">Implement support for
                 C++17 structured bindings in the Clang Static Analyzer</a>
               </li>
               <li><a href="#clang-improve-diagnostics">Improve Clang Diagnostics</a>
               </li>
             </ul>
           </li>

           <li>
             <a href="https://polly.llvm.org"><b>Polly</b></a>
             <ul>
               <li><a href="#polly_npm">Completely switch to new pass manager</a></li>
             </ul>
           </li>

           <li>
             <b><a href="https://enzyme.mit.edu">Enzyme</a></b>
             <ul>
               <li><a href="#enzyme_tblgen">Move Enzyme Instruction Transformation Rules to Tablegen</a></li>
               <li><a href="#enzyme_vector">Vector Reverse-Mode Automatic Differentiation</a></li>
               <li><a href="#enzyme_pm">Enable The New Pass Manager</a></li>
             </ul>
           </li>
         </ul>
       </li>

       <li>
         <a href="#gsoc21">Google Summer of Code 2021</a>
         <ul>
           <li>
             <b>LLVM Core</b>
             <ul>
               <li><a href="#llvm_distributing_lit">Distributed lit testing</a></li>
               <li><a href="#llvm_loop_heuristics">Learning Loop Transformation Heuristics</a></li>
               <li><a href="#llvm_ir_fuzzing">Fuzzing LLVM-IR Passes</a></li>
               <li><a href="#llvm_ir_assume"><tt>llvm.assume</tt> the missing pieces</a></li>
               <li><a href="#llvm_shared_jitlink">Implement a shared-memory based JITLinkMemoryManager for out-of-process JITting</a></li>
               <li><a href="#llvm_build_jit_tutorial">Modernize the LLVM "Building A JIT" tutorial series</a></li>
               <li><a href="#llvm_jit_new_format">Write JITLink support for a new format/architecture</a></li>
               <li><a href="#llvm_ir_issues">Fix fundamental issues in LLVM's IR</a></li>
               <li><a href="#llvm_utilize_loopnest">Utilize LoopNest Pass</a></li>
             </ul>
           </li>

           <li><a href="http://clang.llvm.org/"><b>Clang</b></a>
             <ul>
               <li><a href="#clang-template-instantiation-sugar">Extend clang AST to
                 provide information for the type as written in template
                 instantiations</a>
               </li>
             </ul>
           </li>

           <li>
             <b>OpenMP</b>
             <ul>
               <li><a href="#openmp_gpu_jit">JIT-ing OpenMP GPU kernels transparently</a></li>
             </ul>
           </li>

           <li>
             <b>OpenACC</b>
             <ul>
               <li><a href="#openacc_rt_diagnostics">OpenACC Diagnostics from the OpenMP Runtime</a></li>
             </ul>
           </li>
           <li>
             <b><a href="https://polly.llvm.org">Polly</a></b>
             <ul>
               <li><a href="#polly_isl_bindings">Use official isl C++ bindings</a></li>
             </ul>
           </li>

           <li>
             <b><a href="https://enzyme.mit.edu">Enzyme</a></b>
             <ul>
               <li><a href="#enzyme_blas">Integrate custom derivatives of BLAS, Eigen, and similar routines into Enzyme</a></li>
               <li><a href="#enzyme_swift">Integrate Enzyme into Swift to provide high-performance differentiation in Swift</a></li>
               <li><a href="#enzyme_fixed">Differentiation of Fixed-Point Arithmetic</a></li>
               <li><a href="#enzyme_rust">Integrate Enzyme into Rust to provide high-performance differentiation in Rust</a></li>
             </ul>
           </li>

           <li>
             <b>Clang Static Analyzer</b>
             <ul>
               <li><a href="#static_analyzer_profling">Clang Static Analyzer performance profiling</a></li>
               <li><a href="#static_analyzer_constraint_solver">Clang Static Analyzer constraint solver improvements</a></li>
             </ul>
           </li>

           <li>
             <b>LLDB</b>
             <ul>
               <li><a href="#lldb_diagnostics">A structured approach to diagnostics in LLDB</a></li>
             </ul>
           </li>
         </ul>
       </li>

       <li>
         <a href="#gsoc20">Google Summer of Code 2020</a>
           <ul>
             <li>
               <b>LLVM Core</b>
               <ul>
                 <li><a href="#llvm_optimized_debugging">Improve debugging of optimized code</a></li>
                 <li><a href="#llvm_ipo">Improve inter-procedural analyses and optimizations</a></li>
                 <li><a href="#llvm_par">Improve parallelism-aware analyses and optimizations</a></li>
                 <li><a href="#llvm_dbg_invariant">Make LLVM passes debug info invariant</a></li>
                 <li><a href="#llvm_mergesim">Improve MergeFunctions to incorporate MergeSimilarFunction patches and ThinLTO Support</a></li>
                 <li><a href="#llvm_dwarf_yaml2obj">Add DWARF support to yaml2obj</a></li>
                 <li><a href="#llvm_hotcold">Improve hot cold splitting to aggressively outline small blocks</a></li>
                 <li><a href="#llvm_pass_order">Advanced Heuristics for Ordering Compiler Optimization Passes</a></li>
                 <li><a href="#llvm_ml_scc">Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations</a></li>
                 <li><a href="#llvm_postdominators">Add PostDominatorTree in LoopStandardAnalysisResults</a></li>
                 <li><a href="#llvm_loopnest">Create loop nest pass</a></li>
                 <li><a href="#llvm_instdump">Instruction properties dumper and checker</a></li>
                 <li><a href="#llvm_movecode">Unify ways to move code or check if code is safe to be moved</a></li>
               </ul>
             <li><a href="http://clang.llvm.org/"><b>Clang</b></a>
               <ul>
                 <li><a href="#clang-template-instantiation-sugar">Extend clang AST to
                     provide information for the type as written in template
                     instantiations</a>
                 </li>
                 <li><a href="#clang-sa-cplusplus-checkers">Find null smart pointer dereferences
                     with the Static Analyzer</a>
                 </li>
               </ul>
             </li>
             <li><a href="http://lldb.llvm.org/"><b>LLDB</b></a></li>
             <ul>
               <li><a href="#lldb-autosuggestions">Support autosuggestions in LLDB's command line</a></li>
               <li><a href="#lldb-more-completions">Implement the missing tab completions for LLDB's command line</a></li>
               <li><a href="#lldb-reimplement-lldb-cmdline">Reimplement LLDB's command-line commands using the public SB API.</a></li>
               <li><a href="#lldb-batch-testing">Add support for batch-testing to the LLDB testsuite.</a></li>
             </ul>
             <li>
               <b>MLIR</b>
               <ul>
                 <li>See the <a href="https://mlir.llvm.org/getting_started/openprojects/">MLIR open project list</a></li>
               </ul>
             </li>
           </ul>
       </li>

       <li>
         <a href="#gsoc19">Google Summer of Code 2019</a>
         <ul>
           <li>
             <b>LLVM Core</b>
             <ul>
               <li><a href="#debuginfo_codegen_mismatch">Debug Info should have no
                   effect on codegen</a></li>
               <li><a href="#llvm_function_attributes">Improve (function) attribute
                   inference</a></li>
               <li><a href="#improve_binary_utilities">Improve LLVM binary utilities
               </a></li>
             </ul>
           </li>
           <li><a href="http://clang.llvm.org/"><b>Clang</b></a>
             <ul>
               <li><a href="#clang-astimporter-fuzzer">Implement an ASTImporter
                   fuzzer</a>
               </li>
               <li><a href="#improve-autocompletion">Improve shell autocompletion
                   for Clang</a>
               </li>
               <li><a href="#analyze-llvm">Apply the Clang Static Analyzer to LLVM-based
                   Projects</a>
               </li>
               <li><a href="#header-generation">Generate annotated sources based on
                   LLVM-IR analyses</a>
               </li>
             </ul>
           </li>
         </ul>
       </li>

       <li><a href="#gsoc18">Google Summer of Code 2018</a></li>
       <li><a href="#gsoc17">Google Summer of Code 2017</a></li>
     </ul>
   </li>

   <li><a href="#what">What is this?</a></li>
   <li><a href="#subprojects">LLVM Subprojects: Clang and more</a></li>
   <li><a href="#improving">Improving the current system</a>
   <ol>
     <li><a href="#target-desc">Factor out target descriptions</a></li>
     <li><a href="#code-cleanups">Implementing Code Cleanup bugs</a></li>
     <li><a href="#programs">Compile programs with the LLVM Compiler</a></li>
     <li><a href="#llvmtest">Add programs to the llvm-test suite</a></li>
     <li><a href="#benchmark">Benchmark the LLVM compiler</a></li>
     <li><a href="#statistics">Benchmark Statistics and Warning System</a></li>
     <li><a href="#coverage">Improving Coverage Reports</a></li>
     <li><a href="#misc_imp">Miscellaneous Improvements</a></li>
   </ol></li>

   <li><a href="#new">Adding new capabilities to LLVM</a>
   <ol>
     <li><a href="#llvm_ir">Extend the LLVM intermediate representation</a></li>
     <li><a href="#pointeranalysis">Pointer and Alias Analysis</a></li>
     <li><a href="#profileguided">Profile-Guided Optimization</a></li>
     <li><a href="#compaction">Code Compaction</a></li>
     <li><a href="#xforms">New Transformations and Analyses</a></li>
     <li><a href="#codegen">Code Generator Improvements</a></li>
     <li><a href="#misc_new">Miscellaneous Additions</a></li>
   </ol></li>

   <li><a href="#using">Project using LLVM</a>
   <ol>
     <li><a href="#machinemodulepass">Add a MachineModulePass</a></li>
     <li><a href="#encodeanalysis">Encode Analysis Results in MachineInstr IR</a></li>
     <li><a href="#codelayoutjit">Code Layout in the LLVM JIT</a></li>
     <li><a href="#fieldlayout">Improved Structure Splitting and Field Reordering</a></li>
     <li><a href="#slimmer">Finish the Slimmer Project</a></li>
   </ol></li>
 </ul>

 <div class="doc_author">
   <p>Written by the <a href="/">LLVM Team</a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_sectiontitle">
   <a name="gsoc24">Google Summer of Code 2024</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p>
     Welcome prospective Google Summer of Code 2024 Students! This document is
     your starting point to finding interesting and important projects for LLVM,
     Clang, and other related sub-projects. This list of projects is not only
     developed for Google Summer of Code, but open projects that really need
     developers to work on and are very beneficial for the LLVM community.
   </p>

   <p>We encourage you to look through this list and see which projects excite
     you and match well with your skill set. We also invite proposals not on this
     list. More information and discussion about GSoC can be found in
     <a href="https://discourse.llvm.org/c/community/gsoc" target="_blank">
       discourse
     </a>. If you have questions about a particular project please find the
     relevant entry in discourse, check previous discussion and ask. If there is
     no such entry or you would like to propose an idea please create a new
     entry. Feedback from the community is a requirement for your proposal to be
     considered and hopefully accepted.
   </p>

   <p>The LLVM project has participated in Google Summer of Code for several years
     and has had some very successful projects. We hope that this year is no
     different and look forward to hearing your proposals. For information on how
     to submit a proposal, please visit the Google Summer of Code main
     <a href="https://summerofcode.withgoogle.com/">website.</a>
   </p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="remove_ub_tests">Remove undefined behavior from tests</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     Many of LLVM's unit tests have been reduced automatically from larger tests.
     Previous-generation reduction tools used undef and poison as placeholders
     everywhere, as well as introduced undefined behavior (UB).
     Tests with UB are not desirable because 1) they are fragile since in the
     future the compiler may start optimizing more aggressively and break the
     test, and 2) it breaks translation validation tools such as
     <a href="https://github.com/AliveToolkit/alive2/">Alive2</a> (since it's
     correct to translate a fuction that is always UB into anything).

     <br />
     The major steps include:
     <ol>
       <li>Replace known patterns such as branch on undef/poison, memory accesses
           with invalid pointers, etc with non-UB patterns.</li>
       <li>Use Alive2 to detect further patterns (by searching for tests that are
           always UB).</li>
       <li>Report any LLVM bug found by Alive2 that is exposed when removing
           UB.</li>
     </ol>
   </p>

   <p><b>Expected result:</b>
     The majority of LLVM's unit tests will be free of UB.</p>

   <p><b>Skills:</b>
     Experience with scripting (Python or PHP) is required.
     Experience with regular expressions is encouraged.
   </p>

   <p><b>Project size:</b> Either medium or large.</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Confirmed Mentor:</b> <a href="https://web.ist.utl.pt/nuno.lopes/">Nuno Lopes</a></p>
   <p><b>Discourse:</b>
     <a href="https://discourse.llvm.org/t/gsoc-2004-remove-undefined-behavior-from-tests/77236">URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="spirv_tablegen">Automatically generate TableGen file for SPIR-V instruction set</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     The existing file that describes the SPIR-V instruction set in LLVM was
     manually created and is not always complete or up to date. Whenever new
     instructions need to be added to the SPIR-V backend, the file must be
     amended. In addition, since it is not created in a systematic way, there are
     often slight discrepancies between how an instruction is described in the
     SPIR-V spec and how it is declared in the TableGen file. Since SPIR-V
     backend developers often use the spec as a reference when developing new
     features, having a consistent mapping between the specification and TableGen
     records will ease development. This project proposes creating a script
     capable of generating a complete TableGen file that describes the SPIR-V
     instruction set given the JSON grammar available in the
     KhronosGroup/SPIRV-Headers repository, and updating SPIR-V backend code to
     use the new definitions. The specific method used for translating the JSON
     grammar to TableGen is left up to the discretion of the applicant, however,
     it should be checked into the LLVM repository with well-documented
     instructions to replicate the translation process so that future maintainers
     will be able to regenerate the file when the grammar changes. Note that the
     grammar itself should remain out-of-tree in its existing separate
     repository.
   </p>

   <p><b>Expected result:</b>
     <ul>
       <li>The SPIR-V instruction set's definition in TableGen is replaced with
       one that is autogenerated.</li>
       <li>A script and documentation are written that support regenerating the
       definitions as needed given the JSON grammar of the SPIR-V instruction
       set.</li>
       <li>Usage of the SPIR-V instruction set in the SPIR-V backend updated to
       use the new autogenerated definitions.</li>
     </ul>
   </p>

   <p><b>Skills:</b>
     Experience with scripting and an intermediate knowledge of C++. Previous
     experience with LLVM/TableGen is a bonus but not required.
   </p>

   <p><b>Project size:</b> Medium (175 hour)</p>

   <p><b>Confirmed Mentors:</b>
     <a href="https://github.com/sudonatalie/">Natalie Chouinard</a>,
     <a href="https://github.com/keenuts/">Nathan Gauër</a></p>

   <p><b>Discourse:</b>
     <a href="https://discourse.llvm.org/t/clang-automatically-generate-tablegen-file-for-spir-v-instruction-set/76369">URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="bitstream_cas">LLVM bitstream integration with CAS (content-addressable storage)</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     The LLVM bitstream file format is used for serialization of intermediate
     compiler artifacts, such as LLVM IR or Clang modules. There are situations
     where multiple bitstream files store identical information, and this
     duplication leads to increased storage requirements.
     <br><br>
     This project aims to integrate the LLVM CAS library into the LLVM bitstream
     file format. If we factor out the frequently duplicated part of a bitstream
     file into a separate CAS object, we can replace all copies with a small
     reference to the canonical CAS object, saving storage.
     <br><br>
     The primary motivating use-case for this project is the dependency scanner
     that's powering "implicitly-discovered, explicitly-built" Clang modules.
     There are real-world situations where even coarse de-duplication on the
     block level could halve the size of the scanning module cache.
   </p>

   <p><b>Expected result:</b>
     There's a way to configure the LLVM bitstream writer/reader to use CAS as
     the backing storage.
   </p>

   <p><b>Skills:</b>
     Intermediate knowledge of C++, some familiarity with data serialization, self-motivation.
   </p>

   <p><b>Project size:</b> Medium or large</p>

   <p><b>Confirmed Mentors:</b>
     <a href="https://github.com/jansvoboda11/">Jan Svoboda</a>,
     <a href="https://github.com/cachemeifyoucan/">Steven Wu</a></p>

   <p><b>Discourse:</b>
     <a href="https://discourse.llvm.org/t/llvm-bitstream-integration-with-cas-content-addressable-storage/76757">URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="three_way_comparison">Add 3-way comparison intrinsics</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     <a href="https://en.wikipedia.org/wiki/Three-way_comparison">3-way comparisons</a>
     return the values -1, 0 or 1 depending on whether the values compare lower,
     equal or greater. They are exposed in C++ via the spaceship operator
     (operator&lt;=&gt;) and in Rust via the PartialOrd and Ord traits.
     Currently, such comparisons produce sub-optimal codegen and optimization
     results in some cases.
     <br/><br/>
     The goal of this project is to resolve these optimization issues by
     implementing new 3-way comparison intrinsics, as described in
     <a href="https://discourse.llvm.org/t/rfc-add-3-way-comparison-intrinsics/76685">[RFC] Add 3-way comparison intrinsics</a>.
     The implementation steps are broadly:
     <ol>
       <li>Add the intrinsics to LLVM IR.</li>
       <li>Implement legalization/expansion support in SelectionDAG and
           GlobalISel.</li>
       <li>Implement optimization support in ConstantFolding, InstSimplify,
           InstCombine, CorrelatedValuePropagation, IndVarSimplify,
           ConstraintElimination, IPSCCP, and other relevant transforms.</li>
       <li> Make use of the intrinsics via InstCombine canonicalization or
           direct emission in clang/rustc.</li>
     </ol>
     Adding new target-independent intrinsics is a good way of becoming familiar with a broad slice of LLVM!
   </p>

   <p><b>Expected result:</b>
     Support for the intrinsics in the backend and the most important
     optimization passes. Ideally full integration starting at the frontend.
   </p>

   <p><b>Skills:</b> Intermediate knowledge of C++ </p>

   <p><b>Project size:</b> Medium or large</p>

   <p><b>Difficulty:</b> Medium</p>

   <p><b>Confirmed Mentors:</b>
     <a href="https://github.com/nikic">Nikita Popov</a>,
     <a href="https://github.com/dc03">Dhruv Chawla</a></p>

   <p><b>Discourse:</b>
     <a href="https://discourse.llvm.org/t/llvm-add-3-way-comparison-intrinsics/76807">URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_www">Improve the LLVM.org Website Look and Feel</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     The llvm.org website serves as the central hub for information about the
     LLVM project, encompassing project details, current events, and relevant
     resources. Over time, the website has evolved organically, prompting the
     need for a redesign to enhance its modernity, structure, and ease of
     maintenance.
     <br/><br/>
     The goal of this project is to create a contemporary and coherent static
     website that reflects the essence of LLVM.org. This redesign aims to improve
     navigation, taxonomy, content discoverability, mobile device support,
     accessibility, and overall usability. Given
     the critical role of the website in the community, efforts will be made to
     engage with community members, seeking consensus on the proposed changes.
   </p>

   <p><b>Expected result:</b>
     A modern, coherent-looking website that attracts new prospect users and
     empowers the existing community with better navigation, taxonomy, content
     discoverability, and overall usability. Since the website is a critical
     infrastructure and most of the community will have an opinion this project
     should try to engage with the community building community consensus on the
     steps being taken. Suggested approach:
     <ul>
       <li>Conduct a comprehensive content audit of the existing website.</li>
       <li>Select appropriate technologies, preferably static site generators
         like Hugo or Jekyll.</li>
       <li>Advocate for a separation of data and visualization, utilizing formats
         such as YAML and Markdown to facilitate content management without
         direct HTML coding.</li>
       <li>Present three design mockups for the new website, fostering open
         discussions and allowing time for alternative proposals from interested
         parties.</li>
       <li>Implement the chosen design, incorporating valuable feedback from the
         community.</li>
       <li>Collaborate with content creators to integrate or update content as
         needed.</li>
     </ul>
     The successful candidate should commit to regular participation in weekly
     meetings, deliver presentations, and contribute blog posts as requested.
     Additionally, they should demonstrate the ability to navigate the community
     process with patience and understanding.
   </p>

   <p><b>Skills:</b>
     Knowledge in the area of web development with static site generators.
     Knowledge in html, css, bootstrap, and markdown. Patience and self-motivation.
   </p>

   <p><b>Difficulty:</b> Hard</p>

   <p><b>Project size:</b> Large</p>

   <p><b>Confirmed Mentors:</b>
     <a href=https://github.com/tlattner>Tanya Lattner</a>,
     <a href=https://github.com/vgvassilev>Vassil Vassilev</a>
   </p>
   <p><b>Discourse:</b>
     <a href="https://discourse.llvm.org/t/improve-the-llvm-org-website-look-and-feel/76864">URL</a>
   </p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-repl-out-of-process">Out-of-process execution for clang-repl</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     The Clang compiler is part of the LLVM compiler infrastructure and supports
     various languages such as C, C++, ObjC and ObjC++. The design of LLVM and
     Clang enables them to be used as libraries, and has led to the creation of
     an entire compiler-assisted ecosystem of tools. The relatively friendly
     codebase of Clang and advancements in the JIT infrastructure in LLVM further
     enable research into different methods for processing C++ by blurring the
     boundary between compile time and runtime. Challenges include incremental
     compilation and fitting compile/link time optimizations into a more dynamic
     environment.
     <br /> <br />
     Incremental compilation pipelines process code chunk-by-chunk by building an
     ever-growing translation unit. Code is then lowered into the LLVM IR and
     subsequently run by the LLVM JIT. Such a pipeline allows creation of
     efficient interpreters. The interpreter enables interactive exploration and
     makes the C++ language more user friendly. Clang-Repl is one example.
     <br /> <br />
     Clang-Repl uses the Orcv2 JIT infrastructure within the same process. That
     design is efficient and easy to implement however it suffers from two
     significant drawbacks. First, it cannot be used in devices which do not have
     sufficient resources to host the entire infrastructure, such as the arduino
     due (see this
     <a href="https://compiler-research.org/meetings/#caas_10Mar2022">talk</a>
     for more details). Second, crashes in user codes mean that the entire
     process crashes, hindering overall reliability and ease of use.
     <br /> <br />
     This project aims to move Clang-Repl to an out-of-process execution model
     in order to address both of these issues.
   </p>

   <p><b>Expected result:</b>
     Implement an out-of-process execution of statements with Clang-Repl;
     Demonstrate that Clang-Repl can support some of the ez-clang use-cases;
     Research into approaches to restart/continue the session upon crash;
     As a stretch goal design a versatile reliability approach for crash recovery;
   </p>

   <p><b>Skills:</b>
     Intermediate knowledge of C++, Understanding of LLVM and the LLVM JIT in particular
   </p>

   <p><b>Project size:</b>Either medium or large.</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Confirmed Mentor:</b>
     <a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
   </p>

   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clang-out-of-process-execution-for-clang-repl/68225">URL</a>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-plugins-windows">Support clang plugins on Windows</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project:</b>
     The Clang compiler is part of the LLVM compiler infrastructure and supports
     various languages such as C, C++, ObjC and ObjC++. The design of LLVM and

     Clang allows the compiler to be extended with plugins[1]. A plugin makes it
     possible to run extra user defined actions during a compilation. Plugins
     are supported on unix and darwin but not on windows due to some specifics of
     the windows platform.
     <br /> <br />
     This project would expose the participant to a broad cross section of the LLVM codebase. It involves exploring the API surface, classifying the interfaces as being public or private, and annotating that information to the API declarations. It would also expose the participant to details and differences of different platforms as this work is cross-platform (Windows, Linux, Darwin, BSD, etc). The resulting changes would improve LLVM on Linux and Windows while enabling new functionality on Windows.
   </p>

   <p><b>Expected result:</b>
     This project aims to allow make clang -fplugin=windows/plugin.dll work. The
     implementation approach should extend the working prototype [3] and extend
     the annotation tool [4]. The successful candidate should be prepared to
     attend a weekly meeting, make presentations and prepare blog posts upon
     request.
   </p>

   <p><i>Further reading</i><br />
     [1] https://clang.llvm.org/docs/ClangPlugins.html
     <br />
     [2] https://discourse.llvm.org/t/clang-plugins-on-windows
     <br />
     [3] https://github.com/llvm/llvm-project/pull/67502
     <br />
     [4] https://github.com/compnerd/ids
   </p>

   <p><b>Skills:</b>
     Intermediate knowledge of C++, Experience with Windows and its compilation
     and linking model.
   </p>

   <p><b>Project size:</b>Either medium or large.</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Confirmed Mentor:</b>
     <a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
     <a href=https://github.com/compnerd>Saleem Abdulrasool</a>
   </p>


   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/support-clang-plugins-on-windows/76408">URL</a>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-on-demand-parsing">On Demand Parsing in Clang</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project:</b> Clang, like any C++ compiler, parses a
     sequence of characters as they appear, linearly. The linear character
     sequence is then turned into tokens and AST before lowering to machine
     code. In many cases the end-user code uses a small portion of the C++
     entities from the entire translation unit but the user still pays the price
     for compiling all of the redundancies.
     <br /> <br />
     This project proposes to process the heavy compiling C++ entities upon using
     them rather than eagerly. This approach is already adopted in Clang’s
     CodeGen where it allows Clang to produce code only for what is being
     used. On demand compilation is expected to significantly reduce the
     compilation peak memory and improve the compile time for translation units
     which sparsely use their contents. In addition, that would have a
     significant impact on interactive C++ where header inclusion essentially
     becomes a no-op and entities will be only parsed on demand.
     <br /> <br />
     The Cling interpreter implements a very naive but efficient
     cross-translation unit lazy compilation optimization which scales across
     hundreds of libraries in the field of high-energy physics.
     <br /> <br />
     <pre>
 // A.h
 #include &lt;string&gt;
 #include &lt;vector&gt;
 template &lt;class T, class U = int&gt; struct AStruct {
   void doIt() { /*...*/ }
   const char* data;
   // ...
 };

 template&lt;class T, class U = AStruct&lt;T&gt;&gt;
 inline void freeFunction() { /* ... */ }
 inline void doit(unsigned N = 1) { /* ... */ }

 // Main.cpp
 #include &quot;A.h&quot;
 int main() {
   doit();
   return 0;
 }
     </pre>
     <br /> <br />
     This pathological example expands to 37253 lines of code to process. Cling
     builds an index (it calls it an autoloading map) where it contains only
     forward declarations of these C++ entities. Their size is 3000 lines of
     code. The index looks like:
     <pre>
 // A.h.index
 namespace std{inline namespace __1{template &lt;class _Tp, class _Allocator&gt; class __attribute__((annotate(&quot;$clingAutoload$vector&quot;)))  __attribute__((annotate(&quot;$clingAutoload$A.h&quot;)))  __vector_base;
   }}
 ...
 template &lt;class T, class U = int&gt; struct __attribute__((annotate(&quot;$clingAutoload$A.h&quot;))) AStruct;
     </pre>
     <br /> <br />
     Upon requiring the complete type of an entity, Cling includes the relevant
     header file to get it. There are several trivial workarounds to deal with
     default arguments and default template arguments as they now appear on the
     forward declaration and then the definition. You can read more in [1].
     <br /> <br />
     Although the implementation could not be called a reference implementation,
     it shows that the Parser and the Preprocessor of Clang are relatively
     stateless and can be used to process character sequences which are not
     linear in their nature. In particular namespace-scope definitions are
     relatively easy to handle and it is not very difficult to return to
     namespace-scope when we lazily parse something. For other contexts such as
     local classes we will have lost some essential information such as name
     lookup tables for local entities. However, these cases are probably not very
     interesting as the lazy parsing granularity is probably worth doing only for
     top-level entities.
     <br /> <br />
     Such implementation can help with already existing issues in the standard
     such as CWG2335, under which the delayed portions of classes get parsed
     immediately when they're first needed, if that first usage precedes the end
     of the class. That should give good motivation to upstream all the
     operations needed to return to an enclosing scope and parse something.
     <br /> <br />
     <b>Implementation approach</b>: Upon seeing a tag definition during parsing
     we could create a forward declaration, record the token sequence and mark it
     as a lazy definition. Later upon complete type request, we could re-position
     the parser to parse the definition body. We already skip some of the
     template specializations in a similar way [2, 3].
     <br /> <br />
     Another approach is every lazy parsed entity to record its token stream and
     change the Toks stored on LateParsedDeclarations to optionally refer to a
     subsequence of the externally-stored token sequence instead of storing its
     own sequence (or maybe change CachedTokens so it can do that
     transparently). One of the challenges would be that we currently modify the
     cached tokens list to append an "eof" token, but it should be possible to
     handle that in a different way.
     <br /> <br />
     In some cases, a class definition can affect its surrounding context in a
     few ways you'll need to be careful about here:
     <br /> <br />
     1) `struct X` appearing inside the class can introduce the name `X` into the
         enclosing context.
     <br /> <br />
     2) `static inline` declarations can introduce global variables with
         non-constant initializers that may have arbitrary side-effects.
     <br /> <br />
     For point (2), there's a more general problem: parsing any expression can
     trigger a template instantiation of a class template that has a static data
     member with an initializer that has side-effects. Unlike the above two
     cases, I don't think there's any way we can correctly detect and handle such
     cases by some simple analysis of the token stream; actual semantic analysis
     is required to detect such cases. But perhaps if they happen only in code
     that is itself unused, it wouldn't be terrible for Clang to have a language
     mode that doesn't guarantee that such instantiations actually happen.
     <br /> <br />
     Alternative and more efficient implementation could be to make the lookup
     tables range based but we do not have even a prototype proving this could be
     a feasible approach.
   </p>

   <p><b>Expected result:</b>
     <ul>
       <li>Design and implementation of on-demand compilation for non-templated functions</li>
       <li>Support non-templated structs and classes</li>
       <li>Run performance benchmarks on relevant codebases and prepare report</li>
       <li>Prepare a community RFC document</li>
       <li>[Stretch goal] Support templates</li>
     </ul>

     The successful candidate should commit to regular participation in weekly
     meetings, deliver presentations, and contribute blog posts as
     requested. Additionally, they should demonstrate the ability to navigate the
     community process with patience and understanding.
   </p>

   <p><i>Further reading</i><br/>
     [1] https://github.com/root-project/root/blob/master/README/README.CXXMODULES.md#header-parsing-in-root
     <br />
     [2] https://github.com/llvm/llvm-project/commit/b9fa99649bc99
     <br />
     [3] https://github.com/llvm/llvm-project/commit/0f192e89405ce
   </p>

   <p><b>Skills:</b>
     Knowledge of C++, Deeper understanding of how Clang works,
     knowledge of Clang AST and Preprocessor.
   </p>

   <p><b>Project size:</b>Large</p>
   <p><b>Difficulty:</b> Hard</p>
   <p><b>Confirmed Mentor:</b>
     <a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
     <a href=https://github.com/mizvekov>Matheus Izvekov</a>
   </p>

   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/on-demand-parsing-in-clang/76912">URL</a>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-doc-improve-usability">Improve Clang-Doc Usability</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project:</b>
     <a href=https://clang.llvm.org/extra/clang-doc.html>Clang-Doc</a> is a
     C/C++ documentation generation tool created as an alternative for Doxygen
     and built on top of LibTooling. This effort started in 2018 and critical
     mass has landed in 2019, but the development has been largely dormant since
     then, mostly due to a lack of resources.
     <br /> <br />
     The tool can currently generate documentation in Markdown and HTML formats,
     but the tool has some structural issues, is difficult to use, the generated
     documentation has usability issues and is missing several key features:
     <ul>
       <li>Not all C/C++ constructs are currently handled by the Markdown and
         HTML emitter limiting the tool’s usability.</li>
       <li>The generated HTML output does not scale with the size of the
         codebase making it unusable for larger C/C++ projects.</li>
       <li>The implementation does not always use the most efficient or
         appropriate data structures which leads to correctness and performance
         issues.</li>
       <li>There is a lot of duplicated boiler plate code which could be
         improved with templates and helpers.</li>
     </ul>
   </p>

   <p><b>Expected result:</b>
     The goal of this project is to address the existing shortcomings and
     improve the usability of Clang-Doc to the point where it can be used to
     generate documentation for large scale projects such as LLVM. The ideal
     outcome is that the LLVM project will use Clang-Doc for generating its <a
     href=https://llvm.org/doxygen/>reference documentation</a>.
     <br /><br />
     Successful proposals should focus not only on addressing the existing
     limitations, but also draw inspiration for other potential improvements
     from other similar tools such as <a href=https://hdoc.io/>hdoc</a>, <a
     href=https://github.com/standardese/standardese>standardese</a>, <a
     href=https://github.com/chromium/subspace/tree/main/subdoc>subdoc</a> or
     <a href=https://cs.opensource.google/fuchsia/fuchsia/+/main:tools/cppdocgen/>cppdocgen</a>.
   </p>

   <p><b>Skills:</b>
     Experience with web technologies (HTML, CSS, JS) and an intermediate
     knowledge of C++. Previous experience with Clang/LibTooling is a bonus but
     not required.
   </p>

   <p><b>Project size:</b> Either medium or large.</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Confirmed Mentor:</b>
     <a href=https://github.com/petrhosek>Petr Hosek</a>,
     <a href=https://github.com/ilovepi>Paul Kirth</a>
   </p>

   <p><b>Discourse:</b> <a href=https://discourse.llvm.org/t/improve-clang-doc-usability/76996>URL</a>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="rich-disassembler-for-lldb">Rich Disassembler for LLDB</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
 <p><b>Description</b></p>

 <p>Use the variable location information from the debug info to annotate LLDB’s disassembler (and `register read`) output with the location and lifetime of source variables. The rich disassembler output should be exposed as structured data and made available through LLDB’s scripting API so more tooling could be built on top of this. In a terminal, LLDB should render the annotations as text.</p>

 <p><b>Expected outcomes</b></p>

 For example, we could augment the disassembly for the following function

 <pre>
 frame #0: 0x0000000100000f80 a.out`main(argc=1, argv=0x00007ff7bfeff1d8) at demo.c:4:10 [opt]
   1   void puts(const char*);
   2   int main(int argc, char **argv) {
   3    for (int i = 0; i < argc; ++i)
 → 4      puts(argv[i]);
   5    return 0;
   6   }
 (lldb) disassemble
 a.out`main:
 ...
   0x100000f71 <+17>: movl  %edi, %r14d
   0x100000f74 <+20>: xorl  %r15d, %r15d
   0x100000f77 <+23>: nopw  (%rax,%rax)
 →  0x100000f80 <+32>: movq  (%rbx,%r15,8), %rdi
   0x100000f84 <+36>: callq 0x100000f9e ; symbol stub for: puts
   0x100000f89 <+41>: incq  %r15
   0x100000f8c <+44>: cmpq  %r15, %r14
   0x100000f8f <+47>: jne 0x100000f80 ; <+32> at demo.c:4:10
   0x100000f91 <+49>: addq  $0x8, %rsp
   0x100000f95 <+53>: popq  %rbx
 ...
 </pre>

 <p>using the debug information that LLDB also has access to (observe how the source variable i is in r15 from [0x100000f77+slide))</p>


 <pre>
 $ dwarfdump demo.dSYM --name  i
 demo.dSYM/Contents/Resources/DWARF/demo: file format Mach-O 64-bit x86-64
 0x00000076: DW_TAG_variable
  DW_AT_location (0x00000098:
  [0x0000000100000f60, 0x0000000100000f77): DW_OP_consts +0, DW_OP_stack_value
  [0x0000000100000f77, 0x0000000100000f91): DW_OP_reg15 R15)
  DW_AT_name ("i")
  DW_AT_decl_file ("/tmp/t.c")
  DW_AT_decl_line (3)
  DW_AT_type (0x000000b2 "int")
 </pre>

 to produce output like this, where we annotate when a variable is live and what its location is:

 <pre>
 (lldb) disassemble
 a.out`main:
 ...                                                               ; i=0
   0x100000f74 <+20>: xorl  %r15d, %r15d                           ; i=r15
   0x100000f77 <+23>: nopw  (%rax,%rax)                            ; |
 →  0x100000f80 <+32>: movq  (%rbx,%r15,8), %rdi                   ; |
   0x100000f84 <+36>: callq 0x100000f9e ; symbol stub for: puts    ; |
   0x100000f89 <+41>: incq  %r15                                   ; |
   0x100000f8c <+44>: cmpq  %r15, %r14                             ; |
   0x100000f8f <+47>: jne 0x100000f80 ; <+32> at t.c:4:10          ; |
   0x100000f91 <+49>: addq  $0x8, %rsp                             ; i=undef
   0x100000f95 <+53>: popq  %rbx
 </pre>

 <p>The goal would be to produce output like this for a subset of unambiguous cases, for example, variables that are constant or fully in registers.</p>

 <p><b>Confirmed mentors and their contacts</p></b>

 <ul>
   <li>Adrian Prantl aprantl@apple.com (primary contact)
   <li>Jonas Devlieghere jdevlieghere@apple.com
 </ul>

 <p><b>Required / desired skills</b></p>

 <p>Required:</p>

 <ul>
   <li>Good understanding of C++
   <li>Familiarity with using a debugger on the terminal
   <li>Need to be familiar with all the concepts mentioned in the example above
   <li>Need to have a good understanding of at least one assembler dialect for machine code (x86_64 or AArch64).
 </ul>

 <p>Desired:</p>

 <ul>
   <li>Compiler knowledge including data flow and control flow analysis is a plus.
   <li>Being able to navigate debug information (DWARF) is a plus.
 </ul>

 <p><b>Size of the project.</b></p>

 <p>medium (~175h)</p>

 <p><b>An easy, medium or hard rating if possible</b></p>

 <p>hard</p>
 <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/rich-disassembler-for-lldb/76952">URL</a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="gpu-delta-debugging">GPU Delta Debugging</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
 <p><b>Description</b></p>

 <p>
 LLVM-reduce, and similar tools perform delta debugging but are less useful if many implicit constraints exist and violation could easily lead to errors similar to the cause that is to be isolated. This project is about developing a GPU-aware version, especially for execution time bugs, that can be used in conjunction with LLVM/OpenMP GPU-record-and-replay, or simply a GPU loader script, to minimize GPU test cases more efficiently and effectively.
 </p>

 <p><b>Expected outcomes</b></p>

   <p>A tool to reduce GPU errors without loosing the original error. Optionally, other properties could be the focus of the reduction, not only errors. </p>

 <p><b>Confirmed mentors and their contacts</p></b>

 <ul>
   <li>Parasyris, Konstantinos parasyris1@llnl.gov
   <li>Johannes Doerfert jdoerfert@llnl.gov
 </ul>

 <p><b>Required / desired skills</b></p>

 <p>Required:</p>

 <ul>
   <li>Good understanding of C++
   <li>Familiarity with GPUs and LLVM-IR
 </ul>

 <p>Desired:</p>

 <ul>
   <li>Compiler knowledge including data flow and control flow analysis is a plus.
   <li>Experience with debugging and bug reduction techniques (llvm-reduce) is helpful
 </ul>

 <p><b>Size of the project.</b></p>

 <p>medium</p>

 <p><b>An easy, medium or hard rating if possible</b></p>

 <p>medium</p>

 <p><b>Discourse:</b>
   <a href="https://discourse.llvm.org/t/gsoc-2024-gpu-delta-debugging/77237">URL</a>
 </p>

 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="offload-libcxx">Offloading libcxx</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
 <p><b>Description</b></p>

 <p>
 Modern C++ defines parallel algorithms as part of the standard library, like `std::transform_reduce(std::execution::par_unseq, vec.begin(), vec.end(), 0, std::plus<int>, …)`. In this project we want to extend an implementation of those that is using OpenMP, including GPU offload, where reasonable. While some algorithms might be amenable to GPU offload via a pure (wrapper) runtime solution, we know others, especially those featuring user provided functors, will also require static program analysis and potentially transformation for additional data management. The goal of the project is to explore different algorithms and the options we have to execute them on the host as well as on accelerator devices, esp. GPUs, automatically via OpenMP.
 </p>

 <p><b>Expected outcomes</b></p>

   <p> Improvements to the prototype support of offloading in libcxx. Evaluations against other offloading approaches and documentation on the missing parts and shortcommings. </p>

 <p><b>Confirmed mentors and their contacts</p></b>

 <ul>
   <li>Johannes Doerfert jdoerfert@llnl.gov
   <li>Tom Scogland scogland1@llnl.gov
   <li>Tom Deakin tom.deakin@bristol.ac.uk
 </ul>

 <p><b>Required / desired skills</b></p>

 <p>Required:</p>

 <ul>
   <li>Good understanding of C++ and C++ standard algorithms
   <li>Familiarity with GPUs and (OpenMP) offloading
 </ul>

 <p>Desired:</p>

 <ul>
   <li>Experience with libcxx (development).
   <li>Experience debugging and profiling GPU code.
 </ul>

 <p><b>Size of the project.</b></p>

 <p>large</p>

 <p><b>An easy, medium or hard rating if possible</b></p>

 <p>medium</p>

 <p><b>Discourse:</b>
   <a href="https://discourse.llvm.org/t/gsoc-2024-offloading-libcxx/77238">URL</a>
 </p>

 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="parameter-tuning">The 1001 thresholds in LLVM</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
 <p><b>Description</b></p>

 <p>
 LLVM has lots of thresholds and flags to avoid "costly cases". However, it is unclear if these thresholds are useful, their value is reasonable, and what impact they really have. Since there are a lot, we cannot do a simple exhaustive search. In some prototype work we introduced a C++ class that can replace hardcoded values and offers control over the threshold, e.g., you can increase the recursion limit via a command line flag from the hardcoded "6" to a different number. In this project we want to explore the thresholds, when they are hit, what it means if they are hit, how we should select their values, and if we need different "profiles".
 </p>

 <p><b>Expected outcomes</b></p>

   <p> Statistical evidence on the impact of various thresholds inside of LLVM's code base, including compile time changes, impact on transformations, and performance measurements. </p>

 <p><b>Confirmed mentors and their contacts</p></b>

 <ul>
   <li>Jan Hueckelheim jhueckelheim@anl.gov
   <li>Johannes Doerfert jdoerfert@llnl.gov
   <li>William Moses wmoses@mit.edu
 </ul>

 <p><b>Required / desired skills</b></p>

 <p>Required:</p>

 <ul>
   <li>Profiling skills and knowledge of statistical reasoning
 </ul>

 <p>Desired:</p>

 <ul>
   <li>Good understanding of the LLVM code base and optimization flow
 </ul>

 <p><b>Size of the project.</b></p>

 <p>medium</p>

 <p><b>An easy, medium or hard rating if possible</b></p>

 <p>easy</p>

 <p><b>Discourse:</b>
   <a href="https://discourse.llvm.org/t/gsoc-2024-the-1001-thresholds-in-llvm/77235">URL</a>
 </p>

 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="gpu-libc">Performance tuning the GPU libc</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
 <p><b>Description</b></p>

 <p>
 We have begun work on a libc library targeting GPUs. This will allow users to call functions such as malloc or memcpy while executing on the GPU. However, it is important that these implementations be functional and performant. The goal of this project is to benchmark the implementations of certain libc functions on the GPU. Work would include writing benchmarks to test the current implementations as well as writing more optimal implementations.
 </p>

 <p><b>Expected outcomes</b></p>

   <p> In-depth performance for libc functions. Overhead of GPU-to-CPU remote procedure calls. More optimal implementations of 'libc' functions. </p>

 <p><b>Confirmed mentors and their contacts</p></b>

 <ul>
   <li>Joseph Huber joseph.huber@amd.com
   <li>Johannes Doerfert jdoerfert@llnl.gov
 </ul>

 <p><b>Required / desired skills</b></p>

 <p>Required:</p>

 <ul>
   <li>Profiling skills and understanding of GPU architecture
 </ul>

 <p>Desired:</p>

 <ul>
   <li>Experience with libc utilities
 </ul>

 <p><b>Size of the project.</b></p>

 <p>small</p>

 <p><b>An easy, medium or hard rating if possible</b></p>

 <p>easy</p>

 <p><b>Discourse:</b>
   <a href="https://discourse.llvm.org/t/libc-gsoc-2024-performance-and-testing-in-the-gpu-libc/77042">URL</a>
 </p>

 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="gpu-first">Improve GPU First Framework</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
 <p><b>Description</b></p>

 <p>
   <a href="https://arxiv.org/abs/2306.11686">GPU First</a> is a methodology and framework that can enable any existing host code to execute the entire program on a GPU without any modification from users.
   The goal of this project is two folded:
   1) Port <a href="https://github.com/shiltian/llvm-project/tree/direct_gpu_compilation">host code</a> to handle RPC to the new plugin and rewrite it with the host RPC framework introduced in the GPU LibC project.
   2) Explore the support for MPI among multiple thread blocks on a single GPU, or even multiple GPUs.
 </p>

 <p><b>Expected outcomes</b></p>

   <p> More efficient GPU First framework that can support both NVIDIA and AMD GPUs. Optionally, upstream the framework. </p>

 <p><b>Confirmed mentors and their contacts</p></b>

 <ul>
   <li>Shilei Tian i@tianshilei.me
   <li>Johannes Doerfert jdoerfert@llnl.gov
   <li>Joseph Huber joseph.huber@amd.com
 </ul>

 <p><b>Required / desired skills</b></p>

 <p>Required:</p>

 <ul>
   <li>Good understanding of C++ and GPU architecture
   <li>Familiarity with GPUs and LLVM IR
 </ul>

 <p>Desired:</p>

 <ul>
   <li>Good understanding of the LLVM code base and OpenMP target offloading
 </ul>

 <p><b>Size of the project.</b></p>

 <p>medium</p>

 <p><b>An easy, medium or hard rating if possible</b></p>

 <p>medium</p>

 <p><b>Discourse:</b>
   <a href="https://discourse.llvm.org/t/openmp-gsoc-2024-improve-gpu-first-framework/77048">URL</a>
 </p>

 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clangir-gpu">Compile GPU kernels using ClangIR</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description:</b>
     Heterogeneous programming models such as
     <a href="https://sycl.tech">SYCL</a>,
     <a href="https://www.openmp.org">OpenMP</a> and
     <a href="https://www.openacc.org">OpenACC</a> help developers to offload
     computationally intensive kernels to GPUs and other accelerators.
     <a href="https://mlir.llvm.org">MLIR</a> is expected to unlock new
     high-level optimisations and better code generation for the next generation
     of compilers for heterogeneous programming models. However, the availability
     of a robust MLIR-emitting C/C++ frontend is a prerequisite for these
     efforts.
   </p><p>
     The <a href="https://clangir.org">ClangIR</a> (CIR) project aims to
     establish a new intermediate representation (IR) for Clang. Built on top of
     MLIR, it provides a dialect for C/C++ based languages in Clang, and the
     necessary infrastructure to emit it from the Clang AST, as well as a
     lowering path to the LLVM-IR dialect. Over the last year, ClangIR has
     evolved into a mature incubator project, and a recent
     <a href="https://discourse.llvm.org/t/rfc-upstreaming-clangir/76587">RFC</a>
     on upstreaming it into the LLVM monorepo has seen positive comments and
     community support.
   </p><p>
     The overall goal of this GSoC project is to identify and implement missing
     features in ClangIR to make it possible to compile GPU kernels in the
     <a href="https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html">OpenCL C language</a>
     to LLVM-IR for the
     <a href="https://registry.khronos.org/SPIR-V">SPIR-V</a> target. The OpenCL
     to SPIR-V flow is a great environment for this project because a) it is
     <a href="https://clang.llvm.org/docs/OpenCLSupport.html">already supported</a>
     in Clang and b) OpenCL's work-item- and work-group-based programming model
     still captures modern GPU architectures well. The contributor will extend
     the AST visitors, the dialect and the LLVM-IR lowering, to add support e.g.
     for multiple address spaces, vector and custom floating point types, and the
     <code>spir_kernel</code> and <code>spir_func</code> calling conventions.
   </p><p>
     A good starting point for this work is the
     <a href="https://github.com/sgrauerg/polybenchGpu/tree/master/OpenCL">Polybench-GPU</a>
     benchmark suite. It contains self-contained small- to medium sized OpenCL
     implementations of common algorithms. We expect only the device code (*.cl
     files) to be compiled via ClangIR. The existing OpenCL support in Clang can
     be used to create lit tests with reference LLVM-IR output to guide the
     development. Optionally, the built-in result verification and time
     measurements in Polybench could also be used to assess the correctness and
     quality of the generated code.
   </p>

   <p><b>Expected result:</b>
     Polybench-GPU's
     <a href="https://github.com/sgrauerg/polybenchGpu/blob/master/OpenCL/2DCONV/2DConvolution.cl"><code>2DCONV</code></a>,
     <a href="https://github.com/sgrauerg/polybenchGpu/blob/master/OpenCL/GEMM/gemm.cl"><code>GEMM</code></a> and
     <a href="https://github.com/sgrauerg/polybenchGpu/blob/master/OpenCL/CORR/correlation.cl"><code>CORR</code></a>
     OpenCL kernels can be compiled with ClangIR to LLVM-IR for SPIR-V.
   </p>

   <p><b>Skills:</b>
     Intermediate C++ programming skills and familiarity with basic compiler
     design concepts are required. Prior experience with LLVM IR, MLIR, Clang or
     GPU programming is a big plus, but willingness to learn is also a
     possibility.
   </p>

   <p><b>Project size:</b> Large</p>

   <p><b>Difficulty:</b> Medium</p>

   <p><b>Confirmed Mentors:</b>
     <a href="https://github.com/jopperm">Julian Oppermann</a>,
     <a href="https://github.com/Naghasan">Victor Lom&uuml;ller</a>,
     <a href="https://github.com/bcardosolopes">Bruno Cardoso Lopes</a>
   </p>

   <p><b>Discourse:</b>
     <a href="https://discourse.llvm.org/t/clangir-compile-gpu-kernels-using-clangir/76984">URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="half-precision-libc">Half precision in LLVM libc</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description:</b></p>

   <p>
     Half precision is an IEEE 754 floating point format that has been widely
     used recently, especially in machine learning and AI. It has been
     standardized as _Float16 in the latest C23 standard, bringing its support to
     the same level as float or double data types. The goal for this project is
     to implement C23 half precision math functions in the LLVM libc library.
   </p>

   <p><b>Expected result:</b></p>

   <ul>
     <li> Setup the generated headers properly so that the type and the functions
          can be used with various compilers (+versions) and architectures. </li>
     <li> Implement generic basic math operations supporting half precision data
          types that work on supported architectures: x86_64, arm (32 + 64),
          risc-v (32 + 64), and GPUs. </li>
     <li> Implement specializations using compiler builtins or special hardware
          instructions to improve their performance whenever possible. </li>
     <li> If time permits, we can start investigating higher math functions for
          half precision. </li>
   </ul>

   <p><b>Skills:</b></p>

   <p>
     Intermediate C++ programming skills and familiarity with basic compiler
     design concepts are required. Prior experience with LLVM IR, MLIR, Clang or
     GPU programming is a big plus, but willingness to learn is also a
     possibility.
   </p>

   <p><b>Project size:</b> Large </p>

   <p><b>Difficulty:</b> Easy/Medium</p>

   <p><b>Confirmed Mentors:</b>
     <a href="mailto:lntue@google.com">Tue Ly</a>,
     <a href="mailto:joseph.huber@amd.com">Joseph Huber</a>,
   </p>

   <p><b>Discourse:</b>
     <a href="https://discourse.llvm.org/t/libc-gsoc-2024-half-precision-in-llvm-libc/77027">URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_sectiontitle">
   <a name="gsoc23">Google Summer of Code 2023</a>
 </div>
 <!-- *********************************************************************** -->


 <div class="www_text">
   <p>
     Google Summer of Code 2023 was very successful for LLVM project. For the
     list of accepted and completed projects, please take a look into Google
     Summer of
     Code <a href="https://summerofcode.withgoogle.com/archive/2023/organizations/llvm-compiler-infrastructure">website</a>.
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>LLVM</a>
 </div>
 <!-- *********************************************************************** -->

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_new_jitlink_reopt">Re-optimization using JITLink</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     In Just-In-Time compilers we often choose a low optimization level to
     minimize compile time and improve launch times and latencies, however some
     functions (which we call hot functions) are used very frequently and for
     these functions it is worth optimizing more heavily. In general hot
     functions can only be identified at runtime (different inputs will cause
     different functions to become hot), so the aim of the reoptimization project
     is to build infrastructure to (1) detect hot functions at runtime and (2)
     compile them a second time at a higher optimization level, hence the name
     "re-optimization".
     <br /><br />
     There are many possible approaches to both parts of this problem. E.g. hot
     functions could be identified by sampling, or using existing profiling
     infrastructure, or by implementing custom instrumentation. Reoptimization
     could be applied to whole functions, or outlining could be used to enable
     optimization of portions of functions. Re-entry into the JIT infrastructure
     from JIT’d code might be implemented on top of existing lazy compilation, or
     via a custom path.
     <br /><br />
     Whatever design is adopted, the goal is that the infrastructure should be
     generic so that it can be used by other LLVM API clients, and should support
     out-of-process JIT-compilation (so some of the solution will be implemented
     in the ORC runtime).

   <p><b>Expected result:</b>
     <ul>
       <li>Improve ergonomics of indirection – ideally all forms of indirection
         (for re-optimization, lazy compilation, and procedure-linkage-tables)
         should be able to share a single stub (and/or binary rewriting metadata)
         at runtime.</li>
       <li>Implement basic re-optimization on top of the tidied up
         indirection.</li>
       <li>(Stretch goal) Garbage-collect unoptimized code that is no longer
         needed once the optimized version is available.</li>
     </ul>

   <p><b>Desirable skills:</b>
     Intermediate C++; Understanding of LLVM and the LLVM JIT in particular.

   <p><b>Project size:</b> Large.</p>

   <p><b>Difficulty:</b> Medium</p>

   <p><b>Confirmed Mentor:</b>
     <a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
     <a href=https://github.com/lhames>Lang Hames</a></p>
   </p>

   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/re-optimization-using-jitlink/68260">URL</a>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_new_jitlink_backends">JITLink new backends</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     JITLink is LLVM's new JIT linker API -- the low-level API that transforms
     compiler output (relocatable object files) into ready-to-execute bytes in
     memory. To do this JITLink’s generic linker algorithm needs to be
     specialized to support the target object format (COFF, ELF, MachO), and
     architecture (arm, arm64, i386, x86-64). LLVM already has mature
     implementations of JITLink for MachO/arm64, MachO/x86-64, ELF/x86-64,
     ELF/aarch64 and COFF/x86-64, while the implementations for ELF/riscv,
     ELF/aarch32 and COFF/i386 are still relatively new.
     <br />
     You can either work on an entirely new architecture like PowerPC or eBPF,
     or complete one of the recently added JITLink implementations. In both cases
     you will likely reuse the existing generic code for one of the target object
     formats. You will also work on relocation resolution, populate PLTs and GOTs
     and wire up the ORC runtime for your chosen target.
     <br />

   <p><b>Expected result:</b>
     Write a JITLink specialization for a not-yet-supported or incomplete
     format/architecture such as PowerPC, AArch32 or eBPF.

   <p><b>Desirable skills:</b>
     Intermediate C++; Understanding of LLVM and the LLVM JIT in particular;
     familiarity with your chosen format/architecture, and basic linker concepts
     (e.g. sections, symbols, and relocations).

   <p><b>Project size:</b> Large.</p>

   <p><b>Difficulty:</b>Medium</p>

   <p><b>Confirmed Mentor:</b>
     <a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
     <a href=https://github.com/lhames>Lang Hames</a></p>
     <a href=https://github.com/weliveindetail>Stefan Gränitz</a></p>
   </p>

   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/jitlink-new-backends/68223">URL</a>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_improving_compile_times">Improving compile times</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     While the primary job of a compiler is to produce fast code (good run-time
     performance), it is also important that optimization doesn’t take too much
     time (good compile-time performance). The goal of this project is to improve
     compile-time without hurting optimization quality.
     <br />
     The general approach to this project is:
     <ol>
       <li>Pick a workload to optimize. For example, this could be a file from
         <a href="https://github.com/llvm/llvm-test-suite/tree/main/CTMark">CTMark</a>
         compiled in a certain build configuration (e.g. <code>-O0 -g</code> or
         <code>-O3 -flto=thin</code>).</li>
       <li>Collect profiling information. This could involve compiler options like
         <code>-ftime-report</code> or <code>-ftime-trace</code> for a high-level
         overview, as well as <code>perf record</code> or
         <code>valgrind --tool=callgrind</code> for a detailed profile.</li>
       <li>Identify places that are unexpectedly slow. This is heavily workload
         dependent.</li>
       <li>Try to optimize an identified hotspot, ideally without impacting generated
         code. The <a href="https://llvm-compile-time-tracker.com/">compile-time tracker</a>
         can be used to quickly evaluate impact on CTMark.</li>
     </ol>
     As a disclaimer, it should be noted that outside of pathological cases,
     compilation doesn’t tend to have a convenient hotspot where 90% of the time
     is spent, instead it is spread out across many passes. As such, individual
     improvements also tend to have only small impact on overall compile-time.
     Expect to do 10 improvements of 0.2% each, rather than one improvement of 2%.
   </p>

   <p><b>Expected result:</b>
     Substantial improvements on some individual files (multiple percent), and a
     small improvement on overall geomean compile-time.</p>

   <p><b>Desirable skills:</b>
     Intermediate C++. Familiarity with profiling tools (especially if you are
     not on Linux, in which case I won’t be able to help).</p>

   <p><b>Project size:</b> Either medium or large.</p>

   <p><b>Difficulty:</b> Medium</p>

   <p><b>Confirmed Mentor:</b> <a href="https://github.com/nikic">Nikita Popov</a></p>

   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/llvm-improving-compile-times/68094">URL</a>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_addressing_rust_optimization_failures">Addressing Rust optimization failures</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     The <a href="https://www.rust-lang.org/">Rust programming language</a> uses
     LLVM for code generation, and heavily relies on LLVM’s optimization
     capabilities. However, there are many cases where LLVM fails to optimize
     typical code patterns that are emitted by rustc. Such issues are reported
     using the <a href="https://github.com/rust-lang/rust/issues?q=is%3Aopen+is%3Aissue+label%3AI-slow">I-slow</a>
     and/or <a href="https://github.com/rust-lang/rust/issues?q=is%3Aopen+is%3Aissue+label%3AA-LLVM">A-LLVM</a> labels.
     <br />
     The usual approach to fixing these issues is:
     <ol>
       <li>Inspect the <code>--emit=llvm-ir</code> output on
         <a href="https://rust.godbolt.org/">Godbolt</a>.</li></li>
       <li>Create an LLVM IR test case that is not optimized when run through
         <code>opt -O3</code>.</li>
       <li>Identify a minimal missing transform and prove its correctness
         using <a href="https://alive2.llvm.org/ce/">alive2</a>.</li>
       <li>Identify which LLVM pass or passes could perform the transform.</li>
       <li>Add necessary test coverage and implement the transform.</li>
       <li>(Much later: Check that the issue is really resolved after the next
         major LLVM version upgrade in Rust.)</li>
     </ol>
     The goal of this project is to address some of the less hard optimization
     failures. This means that in some cases, the process would stop after step 3
     or 4 without proceeding to implementation, because it’s unclear how the issue
     could be addressed, or it would take a large amount of effort. Having an
     analysis of the problem is still valuable in that case.
   </p>

   <p><b>Expected result:</b>
     Fixes for a number of easy to medium Rust optimization failures. Preliminary
     analysis for some failures even if no fix was implemented.</p>

   <p><b>Desirable skills:</b>
     Intermediate C++ for implementation. Some familiarity with LLVM (at least
     ability to understand LLVM IR) for analysis. Basic Rust knowledge (enough
     to read, but not write Rust).</p>

   <p><b>Project size:</b> Either medium or large.</p>

   <p><b>Difficulty:</b> Medium</p>

   <p><b>Confirmed Mentor:</b> <a href="https://github.com/nikic">Nikita Popov</a></p>

   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/llvm-addressing-rust-optimization-failures-in-llvm/68096">URL</a>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-repl-autocompletion">Implement autocompletion in clang-repl</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     The Clang compiler is part of the LLVM compiler infrastructure and supports
     various languages such as C, C++, ObjC and ObjC++. The design of LLVM and
     Clang enables them to be used as libraries, and has led to the creation of
     an entire compiler-assisted ecosystem of tools. The relatively friendly
     codebase of Clang and advancements in the JIT infrastructure in LLVM further
     enable research into different methods for processing C++ by blurring the
     boundary between compile time and runtime. Challenges include incremental
     compilation and fitting compile/link time optimizations into a more dynamic
     environment.
     <br /> <br />
     Incremental compilation pipelines process code chunk-by-chunk by building an
     ever-growing translation unit. Code is then lowered into the LLVM IR and
     subsequently run by the LLVM JIT. Such a pipeline allows creation of
     efficient interpreters. The interpreter enables interactive exploration and
     makes the C++ language more user friendly. The incremental compilation mode
     is used by the interactive C++ interpreter, Cling, initially developed to
     enable interactive high-energy physics analysis in a C++ environment.
     <br /> <br />
     <a href="https://compiler-research.org/">Our group</a> puts efforts to
     incorporate and possibly redesign parts of Cling in Clang mainline through a
     new tool, clang-repl. The project aims at the design and implementation of
     robust autocompletion when users type C++ at the prompt of clang-repl.
     For example:
     <pre>
       [clang-repl] class MyLongClassName {};
       [clang-repl] My&lt;tab&gt;
       // list of suggestions.
     </pre>
   </p>

   <p><b>Expected result:</b>
     There are several foreseen tasks:
     <ul>
       <li>Research the current approaches for autocompletion in clang such as
         clang -code-completion-at=file:col1:col2.</li>
       <li>Implement a version of the autocompletion support using the partial
         translation unit infrastructure in clang’s libInterpreter.</li>
       <li>Investigate the requirements for semantic autocompletion which takes
         into account the exact grammar position and semantics of the code. Eg:
         <pre>
           [clang-repl] struct S {S* operator+(S&) { return nullptr;}};
           [clang-repl] S a, b;
           [clang-repl] v = a + &lt;tab&gt; // shows b as the only acceptable choice here.
         </pre>
       </li>
       <li>Present the work at the relevant meetings and conferences.</li>
   </p>

   <p><b>Project size:</b>Large.</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Confirmed Mentor:</b>
     <a href=https://github.com/vgvassilev>Vassil Vassilev</a>
   </p>

   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clang-repl-implement-autocompletion-in-clang-repl/60364">URL</a>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-modules-build-daemon">Modules build daemon: build system agnostic support for explicitly built modules</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project:</b> Clang currently handles modules independently in each
     <code>clang</code> instance using the filesystem for synchronization of which instance builds
     a given module. This has many issues with soundness and performance due to tradeoffs made for
     module reuse and filesystem contention.</p>
   <p>Clang has another way of building modules, explicitly built modules, that currently requires
     build system changes to adopt. Here the build system determines which modules are needed, for
     example by using <a href="https://github.com/llvm/llvm-project/tree/main/clang/tools/clang-scan-deps">clang-scan-deps</a>,
     and ensures those modules are built before running the <code>clang</code> compile task that
     needs them.</p>
   <p>In order to allow adoption of this new way of building modules without major build system work
     we need a module build daemon. With a small change to the command line, clang will connect to
     this daemon and ask for the modules it needs. The module build daemon then either returns an
     existing valid module, or builds and then returns it.</p>
   <p>There is an existing open source dependency scanning daemon that is in a llvm-project fork.
     This only handles file dependencies, but has an IPC mechanism. This IPC system could be used as
     a base for the modules build daemon, but does need to be extended to work on Windows.</p>
   <p><b>Expected result:</b> A normal project using Clang modules with an existing build system
     (like Make or CMake) can be built using only explicitly built modules via a modules build
     daemon.</p>
   <p><b>Desirable skills:</b> Intermediate C++ programming skills; familiarity with compilers;
     familiarity with Clang is an asset, but not required.</p>
   <p><b>Project size:</b> 175h or 350h depending on reuse of IPC</p>
   <p><b>Difficulty:</b> medium</p>
   <p><b>Confirmed Mentors:</b>
     <a href="https://github.com/Bigcheese">Michael Spencer</a>,
     <a href="https://github.com/jansvoboda11">Jan Svoboda</a>
   </p>
   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clang-modules-build-daemon-build-system-agnostic-support-for-explicitly-built-modules/68224">URL</a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-extract-api-categories">ExtractAPI Objective-C categories</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project:</b> <a href="https://github.com/apple/swift-docc">Swift-DocC</a> is
   the canonical documentation compiler for the Swift OSS project. However
   Swift-DocC is not Swift specific and
   uses <a href="https://github.com/apple/swift-docc-symbolkit/blob/main/openapi.yaml">SymbolKit</a>'s
   languaguage agnostic JSON-based symbol graph format to understand which
   symbols are available in the code, this way any language can be supported by
   Swift-DocC as long as there is a symbol graph generator.</p>
   <p>Clang supports symbol graph generation for C and Objective-C as described
   in <a href="https://discourse.llvm.org/t/rfc-clang-support-for-api-information-generation-in-json/58845">[RFC]
   clang support for API information generation in JSON</a>. Today, support for
   Objective-C categories is not complete, on one hand if the category extends a
   type in the current module, the category members are assumed to belong to the
   extended type itself. On the other hand, if the extended type belongs to
   another module the category is ignored. Nonetheless, it is common to extend
   types belonging to other modules in Objective-C as part of the public API of
   the module. The goal of this project is to extend the symbol graph format to
   accommodate Objective-C categories and to implement support for generating
   this information both through clang and through libclang.</p>
   <p><b>Expected result:</b> Adding the necessary support to clang's symbol graph
   generator and in libclang for describing categories of symbols defined in
   other modules. This might involve additions to SymbolKit that would need to be
   discussed with that community.</p>
   <p><b>Desirable skills:</b> Intermediate C++ programming skills; familiarity
   with clang and Objective-C are assets but not required.</p>
   <p><b>Project size:</b> Medium</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Confirmed Mentors:</b>
     <a href="https://github.com/daniel-grumberg">Daniel Grumberg</a>,
     <a href="https://github.com/zixu-w">Zixu Wang</a>,
     <a href="https://github.com/ributzka">Juergen Ributzka</a>
   </p>
   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clang-extractapi-objective-c-categories/68370">URL</a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-extract-api-cpp-support">ExtractAPI C++ Support</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project:</b> <a href="https://github.com/apple/swift-docc">Swift-DocC</a> is
   the canonical documentation compiler for the Swift OSS project. However
   Swift-DocC is not Swift specific and
   uses <a href="https://github.com/apple/swift-docc-symbolkit/blob/main/openapi.yaml">SymbolKit</a>'s
   languaguage agnostic JSON-based symbol graph format to understand which
   symbols are available in the code, this way any language can be supported by
   Swift-DocC as long as there is a symbol graph generator.</p>
   <p>Clang supports symbol graph generation for C and Objective-C as described
   in <a href="https://discourse.llvm.org/t/rfc-clang-support-for-api-information-generation-in-json/58845">[RFC]
   clang support for API information generation in JSON</a>.</p>
   <p>Currently the emitted symbol graph format does not support various C++
   constructs such as templates and exceptions and the symbol graph generator
   does not fully understand C++. This project aims to introduce support for
   various C++ constructs in the symbol graph format and to implement support for
   generating this data in clang.</p>
   <p><b>Expected result:</b> Adding the necessary support to clang's symbol graph
   generator and in libclang for describing categories of symbols defined in
   other modules. This will involve additions to SymbolKit that would need to be
   discussed with that community.</p>
   <p><b>Desirable skills:</b> Intermediate C++ programming skills; familiarity
   with clang and Objective-C are assets but not required.</p>
   <p><b>Project size:</b> Large</p>
   <p><b>Difficulty:</b> Medium/Hard</p>
   <p><b>Confirmed Mentors:</b>
     <a href="https://github.com/daniel-grumberg">Daniel Grumberg</a>,
     <a href="https://github.com/zixu-w">Zixu Wang</a>,
     <a href="https://github.com/ributzka">Juergen Ributzka</a>
   </p>
   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/extractapi-c-support/68371">URL</a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-extract-api-while-building">ExtractAPI while building</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project:</b> <a href="https://github.com/apple/swift-docc">Swift-DocC</a> is
   the canonical documentation compiler for the Swift OSS project. However
   Swift-DocC is not Swift specific and
   uses <a href="https://github.com/apple/swift-docc-symbolkit/blob/main/openapi.yaml">SymbolKit</a>'s
   languaguage agnostic JSON-based symbol graph format to understand which
   symbols are available in the code, this way any language can be supported by
   Swift-DocC as long as there is a symbol graph generator.</p>
   <p>Clang supports symbol graph generation for C and Objective-C as described
   in <a href="https://discourse.llvm.org/t/rfc-clang-support-for-api-information-generation-in-json/58845">[RFC]
   clang support for API information generation in JSON</a>.</p>
   <p>Currently users can use clang to generate symbol graph files using
   the <code>clang -extract-api</code> command line interface or generating
   symbol graphs for a specific symbol using the libclang interface. This project
   would entail adding a third mode that would generate the symbol graph output
   as a side-effect of a regular compilation job. This can enable using the
   symbol graph format as a light weight alternative to clang Index or clangd
   for code intelligence services.</p>
   <p><b>Expected result:</b> Enable generating symbol graph files during a
   regular compilation (or module build); provide a tool to merge symbol graph
   files in the same way a static linker links individual object files; Extend
   clang Index to support all the information contained by symbol graph
   files.</p>
   <p><b>Desirable skills:</b> Intermediate C++ programming skills; familiarity
   with clang and Objective-C are assets but not required.</p>
   <p><b>Project size:</b> Medium</p>
   <p><b>Difficulty:</b> Medium/Hard</p>
   <p><b>Confirmed Mentors:</b>
     <a href="https://github.com/daniel-grumberg">Daniel Grumberg</a>,
     <a href="https://github.com/zixu-w">Zixu Wang</a>,
     <a href="https://github.com/ributzka">Juergen Ributzka</a>
   </p>
   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clang-extractapi-while-building/68372">URL</a></p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-improve-diagnostics2">Improve Clang diagnostics</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description:</b>
   The diagnostics clang emits are ultimately its interface to the developer. While the diagnostics are generally good, there are some rough edges that need to be ironed out. Some cases can be improved by special-casing them in the compiler as well.
   </p>

   <p>
   As one can see from Clang’s issue tracker, there are <a href="https://github.com/llvm/llvm-project/issues?page=2&q=is%3Aopen+is%3Aissue+label%3Aclang%3Adiagnostics">lots of issues</a> open against clang’s diagnostics.
   </p>

   <p>
   This project does not aim to implement one big feature but instead focuses on smaller, incremental improvements to Clang’s diagnostics.
   </p>

   <p>
   Possible example issues to resolve:
   <ul>
     <li><a href="https://github.com/llvm/llvm-project/issues/59872">Calling nullptr function pointer in a constexpr function results in poor diagnostic</a></li>
     <li><a href="https://github.com/llvm/llvm-project/issues/58601">Print name of uninitialized subobject (instead of type)</a></li>
     <li><a href="https://github.com/llvm/llvm-project/issues/57906">https://github.com/llvm/llvm-project/issues/57906</a></li>
     <li><a href="https://github.com/llvm/llvm-project/issues/57337">clang(++) unhelpful frame-larger-than warning, very small stack frame exceeding very large limit</a></li>
     <li>Any other diagnostics issue you find interesting or ran into personally.</li>
   </ul>
   </p>


   <p><b>Expected outcomes</b>:
   At least three fixed smaller diagnostics issues, or one larger implemented diagnostics improvement.
   </p>

   <p><b>Confirmed Mentor:</b><a href=https://github.com/tbaederr>Timm Bäder</a>

   <p><b>Desirable skills:</b>
     <ul>
       <li>Intermediate C++ knowledge.</li>
       <li>Preferably experience in the Clang code base, since the issues mentioned can have their root cause in various parts of it.</li>
       <li>Preferably an already working local LLVM build</li>
     </ul>
   </p>

   <p><b>Project type:</b> Medium/200 hr</p>

   <p><b>Discourse</b>
   <a href="https://discourse.llvm.org/t/improve-clang-diagnostics-2/68900/3">URL</a>
   </p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-tutorials-clang-repl">Tutorial development with clang-repl</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description:</b>
     The Clang compiler is part of the LLVM compiler infrastructure and supports
     various languages such as C, C++, ObjC and ObjC++. The design of LLVM and
     Clang enables them to be used as libraries, and has led to the creation of
     an entire compiler-assisted ecosystem of tools. The relatively friendly
     codebase of Clang and advancements in the JIT infrastructure in LLVM further
     enable research into different methods for processing C++ by blurring the
     boundary between compile time and runtime. Challenges include incremental
     compilation and fitting compile/link time optimizations into a more dynamic
     environment.
   </p>

   <p>
     Incremental compilation pipelines process code chunk-by-chunk by building an
     ever-growing translation unit. Code is then lowered into the LLVM IR and
     subsequently run by the LLVM JIT. Such a pipeline allows creation of
     efficient interpreters. The interpreter enables interactive exploration and
     makes the C++ language more user friendly. The incremental compilation mode
     is used by the interactive C++ interpreter, Cling, initially developed to
     enable interactive high-energy physics analysis in a C++ environment.
   </p>

   <p>
     We invest efforts to incorporate and possibly redesign parts of Cling in
     Clang mainline through a new tool, clang-repl. The project aims implementing
     tutorials demonstrating the capabilities of the project and investigating
     adoption of clang-repl in xeus-clang-repl prototype allowing to write C++
     in Jupyter.
   </p>

   <p><b>Expected result:</b>
     There are several foreseen tasks:
     <ul>
       <li>Write several tutorials demostrating the current capabilities of
         clang-repl.</li>
       <li>Investigate the requirements for adding clang-repl as a backend to
         xeus-cling.</li>
       <li>Improve the xeus kernel protocol for clang-repl.</li>
       <li>Prepare a blog post about clang-repl and possibly Jupyter.
         Present the work at the relevant meetings and conferences.</li>
   </p>


   <p><b>Confirmed Mentor:</b>
     <a href="https://github.com/vgvassilev">Vassil Vassilev</a>
     <a href="https://github.com/davidlange6">David Lange</a>

   <p><b>Desirable skills:</b>
     Intermediate C++; Understanding of Clang and the Clang API in particular
   </p>

   <p><b>Project type:</b> Medium</p>

   <p><b>Discourse</b>
   <a href="https://discourse.llvm.org/t/clang-repl-tutorial-development-with-clang-repl/60365">URL</a>
   </p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-repl-wasm">Add WebAssembly Support in clang-repl</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description:</b>
     The Clang compiler is part of the LLVM compiler infrastructure and supports
     various languages such as C, C++, ObjC and ObjC++. The design of LLVM and
     Clang enables them to be used as libraries, and has led to the creation of
     an entire compiler-assisted ecosystem of tools. The relatively friendly
     codebase of Clang and advancements in the JIT infrastructure in LLVM further
     enable research into different methods for processing C++ by blurring the
     boundary between compile time and runtime. Challenges include incremental
     compilation and fitting compile/link time optimizations into a more dynamic
     environment.
   </p>

   <p>
     Incremental compilation pipelines process code chunk-by-chunk by building an
     ever-growing translation unit. Code is then lowered into the LLVM IR and
     subsequently run by the LLVM JIT. Such a pipeline allows creation of
     efficient interpreters. The interpreter enables interactive exploration and
     makes the C++ language more user friendly. The incremental compilation mode
     is used by the interactive C++ in Jupyter via the xeus kernel protocol.
     Newer versions of the protocol allow possible in-browser execution allowing
     further possibilities for clang-repl and Jupyter.
   </p>

   <p>
     We invest efforts to incorporate and possibly redesign parts of Cling in
     Clang mainline through a new tool, clang-repl. The project aims to add
     WebAssembly support in clang-repl and adopt it in xeus-clang-repl to aid
     Jupyter-based C++.
   </p>

   <p><b>Expected result:</b>
     There are several foreseen tasks:
     <ul>
       <li>Investigate feasibility of generating WebAssembly in a similar way to
         the new <a href="https://reviews.llvm.org/D146389">interactive CUDA support</a>.</li>
       <li>Enable generating WebAssembly in clang-repl.</li>
       <li>Adopt the feature in xeus-clang-repl.</li>
       <li>Prepare a blog post about clang-repl and possibly Jupyter.
         Present the work at the relevant meetings and conferences.</li>
   </p>


   <p><b>Confirmed Mentor:</b>
     <a href="https://github.com/vgvassilev">Vassil Vassilev</a>
     <a href="https://github.com/alexander-penev">Alexander Penev</a>

   <p><b>Desirable skills:</b>
     Good C++; Understanding of Clang and the Clang API and the LLVM JIT in particular
   </p>

   <p><b>Project type:</b> Large</p>

   <p><b>Discourse</b>
   <a href="https://discourse.llvm.org/t/clang-repl-add-webassembly-support-in-clang-repl/69419">URL</a>
   </p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_lld_embedded">LLD Linker Improvements for Embedded Targets</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project</b>
     GNU toolchain is used widely for building embedded targets. There's a certain momentum in the
     Clang/LLVM community towards improving the Clang toolchain to support embedded targets. Using
     the Clang toolchain as an alternative can help us improve code quality, find and fix security
     bugs, improve developer experience and take advantage of the new ideas and the momentum
     surrounding the Clang/LLVM community in supporting embedded devices.
   </p>
   <p><b>A non-comprehensive list of improvements that can be made to LLD</b>:
   <ul>
     <li>
       <p><b>--print-memory-usage support</b></p>
       <p>"--print-memory-usage" in GCC provides a breakdown of the memory used in each memory region
         defined in the linker file. Embedded developers use this flag to understand the impact on
         memory. Often embedded systems define multiple memory regions with different space
         constraints. Supporting this in Clang toolchain will help projects that wish to use Clang
         toolchain for their projects.</p>
     </li>
     <li>
       <p><b>Linkmap</b></p>
       <p>Currently, the LLD linker's linkmap output is not as rich as the BFD linker output.
         Achieving feature parity on linkmap output will be highly
         useful in analyzing the binaries created by the LLD linker. Further, outputting linkmap in
         different formats (current LLD output, BFD, and JSON) can help build automation tools for
         investigating the artifacts produced by the linker.</p>
     </li>
     <li>
       <p><b>--print-gc-sections improvement</b></p>
       <p>When the "--print-gc-sections" flag is enabled, LLD prints the sections that were
         discarded during the linking process. This information currently does not include the
         mapping between the symbol and the section groups, which is useful for debugging.
         Preserving this information during the linking process will require modifications to
         internal linker data structures.</p>
     </li>
   </ul>
   <p><b>Project size:</b> Medium or Large</p>
   <p><b>Difficulty:</b> Medium/Hard</p>
   <p><b>Skills:</b> C++</p>
   <p><b>Expected result</b>:
     <ul>
       <li>Implementation of "--print-memory-usage" flag.</li>
       <li>Support for new linkmap output formats 1. BFD and 2. JSON. </li>
       <li>Improved "--print-gc-sections" output to include information about the surviving symbols.</li>
     </ul>
   </p>
   <p><b>Confirmed Mentors:</b>
     <a href="https://github.com/Prabhuk">Prabhu Rajasekaran</a>
     <a href="https://github.com/petrhosek">Petr Hosek</a>
   </p>
   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/lld-linker-improvements-for-embedded/68129">URL</a>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_mlir_presburger_opt">Optimizing MLIR’s Presburger library </a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
 <p><strong>Description</strong>: MLIR’s Presburger Library, FPL (<a href="https://grosser.science/FPL">https://grosser.science/FPL</a>), provides mathematical abstractions for polyhedral compilation and analysis. The main abstraction that the library provides is a set of integer tuples defined by a system of affine inequality constraints. The library supports standard set operations over such sets. The result will be a set defined by another constraint system, possibly having more constraints. When many set operations are performed in sequence, the constraint system may become very large, negatively impacting performance. There are several potential ways to simplify the constraint system; however, this involves performing additional computations. Thus, spending more time on more aggressive simplifications may make each individual operation slower, but at the same time, insufficient simplifications can make sequences of operations slow due to an explosion in constraint system size. The aim of this project is to find the right balance between the two.</p>
 <p><strong>The goals of this project:</strong></p>
 <ul>
 <li>Understand the library&#39;s performance in terms of runtime and output size.</li>
 <li>Optimize the library by finding the best output size and performance tradeoff.</li>
 </ul>
 <p><strong>Expected outcomes</strong>:</p>
 <ul>
 <li>Benchmarking the performance and output constraint complexity of the primary operations of the library.</li>
 <li>Implementing simplification heuristics.</li>
 <li>A better understanding of which simplification heuristics improve overall performance enough to be worth the additional computational cost.</li>
 </ul>
 <p><strong>Desirable skills</strong>: Intermediate C++, Experience in benchmarking</p>
 <p><strong>Project size</strong>: Large</p>
 <p><strong>Difficulty</strong>: Medium</p>
 <p><strong>Confirmed mentors</strong>: <a href="https://github.com/Groverkss">Kunwar Grover</a></p>
 <p><strong>Discourse</strong>: <a href="https://discourse.llvm.org/t/mlir-optimizing-mlir-s-presburger-library/68213/1">URL</a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_mlir_query">Interactively query MLIR IR</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
 <p><strong>Description</strong>:
   The project aims to develop an interactive query language for MLIR that enables developers to query the MLIR IR dynamically.
   The tool will provide a REPL (or command-line) interface to enable users to query various properties of MLIR code, such as
   "isConstant" and "resultOf". The proposed tool is intended to be similar to clang-query, which allows developers to match
   AST expressions in C++ code using a TUI with autocomplete and other features.
 </p>
 <p><strong>The goals of this project:</strong></p>
 <ul>
 <li>Understand the MLIR IR representation and common explorations user do.</li>
 <li>Implement a REPL to execute queries over MLIR IR.</li>
 </ul>
 <p><strong>Expected outcomes</strong>:</p>
 <ul>
 <li>Standalone that can be used to interactively explore IR.</li>
 <li>Implement common matchers that are usable by the tool.</li>
 <li>(stretch) Enable extracting parts of the IR matched by query into self-contained IR snippets.</li>
 </ul>
 <p><strong>Desirable skills</strong>: Intermediate C++, Experience in writing/debugging peephole optimizations</p>
 <p><strong>Project size</strong>: Either medium or large.</p>
 <p><strong>Difficulty</strong>: Medium</p>
 <p><strong>Confirmed mentors</strong>: <a href="https://github.com/jpienaar">Jacques Pienaar</a></p>
 <p><strong>Discourse</strong>: <a href="https://discourse.llvm.org/t/gsoc-proposal-interactive-mlir-query-tool-to-make-exploring-the-ir-easier/69601">URL</a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_mlgo_latency_model">Better performance models for MLGO training</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project</b>
     We are using machine-guided compiler optimizations ("MLGO") for register allocation eviction and inlining for size, in
     real-life deployments. The ML models have been trained with reinforcement learning algorithms. Expanding to more
     performance areas is currently impeded by the poor prediction quality of our performance estimation models. Improving
     those is critical to the effectiveness of reinforcement learning training algorithms, and therefore to enabling applying
     MLGO systematically to more optimizations.
   </p>
   <p><b>Project size:</b> either 175 or 350 hr.</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Skills:</b> C/C++, some compiler experience, some Python. ML experience is a bonus.</p>
   <p><b>Expected outcomes</b>: Better modeling of the execution environment by including additional runtime/profiling
     information, such as additional PMU data, LLC miss probabilities or branch mispredictions. This involves (1) building
     a data collection pipeline that covers additional runtime information, (2) modifying the ML models to allow processing
     this data, and (3) modifying the training and inference process for the models to make use this data.
   <p>Today, the models are almost pure static analysis; they see the instructions, but they make one-size-fits-all
     assumptions about the execution environment and the runtime behavior of the code. The goal of this project is to move
     from static analysis towards more dynamic models that better represent code the way it actually executes.</p>
   <p><b>Mentors</b>
     Ondrej Sykora, Mircea Trofin, Aiden Grossman
   </p>
   <p>
   <b>Discourse</b>
   <a href="https://discourse.llvm.org/t/better-performance-models-for-mlgo-training/68219">URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang_analyzer_taint_analysis">Improve and Stabilize the Clang Static Analyzer's "Taint Analysis" Checks</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     The Clang static analyzer comes with an experimental implementation of
     taint analysis, a security-oriented analysis technique built to warn
     the user about flow of attacker-controlled ("tainted") data into
     sensitive functions that may behave in unexpected and dangerous ways
     if the attacker is able to forge the right input. The programmer can address
     such warnings by properly "sanitizing" the tainted data in order to
     eliminate these dangerous inputs. A common example of a problem that can be
     caught this way is <a href="https://xkcd.com/327/">SQL injections</a>.
     A much simpler example, which is arguably much more relevant to users
     of Clang, is buffer overflow vulnerabilities caused by attacker-controlled
     numbers used as loop bounds while iterating over stack or heap arrays, or
     passed as arguments to low-level buffer manipulating functions such as
     <tt>memcpy()</tt>.
   </p>
   <p>
     Being a static symbolic execution engine, the static analyzer implements
     taint analysis by simply maintaining a list of "symbols" (named unknown
     numeric values) that were obtained from known taint sources during the
     symbolic simulation. Such symbols are then treated as potentially taking
     arbitrary concrete values, as opposed to the general case of taking an
     unknown subset of possible values. For example, division by a unchecked
     unknown value doesn't necessarily warrant a division by zero warning,
     because it's typically not known whether the value can be zero or not.
     However, division by an unchecked <i>tainted</i> value does immediately
     warrant a division by zero warning, because the attacker is free
     to pass zero as an input. Therefore the static analyzer's taint
     infrastructure consists of several parts: there is a mechanism for keeping
     track of tainted symbols in the symbolic program state, there is a way to
     define new sources of taint, and a few path-sensitive checks were taught to
     consume taint information to emit additional warnings (like the division
     by zero checker), acting as taint "sinks" and defining checker-specific
     "sanitization" conditions.
   </p>
   <p>
     The entire facility is flagged as experimental: it's basically a
     proof-of-concept implementation. It's likely that it can be made to work
     really well, but it needs to go through some quality control by running it
     on real-world source code, and a number of bugs need to be addressed,
     especially in individual checks, before we can declare it stable.
     Additionally, the tastiest check of them all – buffer overflow detection
     based on tainted loop bounds or size parameters – was never implemented.
     There is also a related check for array access with tainted index – which
     is, again, experimental; let's see if we can declare this one stable
     as well!
   </p>

   <p><b>Expected result:</b>
     A number of taint-related checks either enabled by default for all users
     of the static analyzer, or available as opt-in for users who care about
     security. They're confirmed to have low false positive rate on real-world
     code. Hopefully, the buffer overflow check is one of them.</p>

   <p><b>Desirable skills:</b>
     Intermediate C++ to be able to understand LLVM code. We'll run our analysis
     on some plain C code as well. Some background in compilers or security is
     welcome but not strictly necessary.
   </p>

   <p><b>Project size:</b> Either medium or large.</p>

   <p><b>Difficulty:</b> Medium</p>

   <p><b>Confirmed Mentors:</b>
   <a href="https://github.com/haoNoQ">Artem Dergachev</a>,
   <a href="https://github.com/xazax-hun">Gábor Horváth</a>,
   <a href="https://github.com/ziqingluo-90">Ziqing Luo</a>
   </p>

   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clang-improve-and-stabilize-the-static-analyzers-taint-analysis-checks/68235">URL</a>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_mlgo_passes_2023">Machine Learning Guided Ordering of Compiler Optimization Passes</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project</b>
   This continues the work of GSoC 2020 and <a href="https://summerofcode.withgoogle.com/archive/2021/projects/6411038932598784">2021</a>.

   Developers generally use standard optimization pipelines like -O2 and -O3 to
   optimize their code. Manually crafted heuristics are used to determine which
   optimization passes to select and how to order the execution of those passes.
   However, this process is not tailored for a particular program, or kind
   of program, as it is designed to perform “reasonably well” for any input.

   We want to improve the existing heuristics or replace the heuristics with
   machine learning-based models so that the LLVM compiler can provide a superior
   order of the passes customized per program.

   The last milestone enabled feature extraction, and started investigating training
   a policy for selecting a more appropriate pass pipeline.
   </p>
   <p><b>Project size:</b> either 175 or 350 hr.</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Skills:</b> C/C++, some compiler experience. ML experience is a bonus.</p>
   <p><b>Expected outcomes</b>: Pre-trained model selecting the most economical
     optimization pipeline, with no loss in performance; hook-up of model in LLVM;
     (re-)training tool; come up with new optimization sequences through search or learning.</p>
   <p><b>Mentors</b>
     Tarindu Jayatilaka, Mircea Trofin, Johannes Doerfert
   </p>
   <p>
   <b>Discourse</b>
   <a href="https://discourse.llvm.org/t/machine-learning-guided-ordering-of-compiler-optimization-passes/60415">URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_code_coverage">Support a hierarchical directory structure in generated coverage html reports</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project:</b><br>
     Clang supports source-based coverage that shows which lines of code are covered by the executed tests
     <a href="https://clang.llvm.org/docs/SourceBasedCodeCoverage.html">[1]</a>.
     It uses llvm-profdata <a href="https://llvm.org/docs/CommandGuide/llvm-profdata.html">[2]</a> and
     llvm-cov <a href="https://llvm.org/docs/CommandGuide/llvm-cov.html">[3]</a> tools to generate coverage reports.
     llvm-cov currently generates a single top-level index HTML file.
     For example, a single top-level directory code coverage report
     <a href="https://lab.llvm.org/coverage/coverage-reports/index.html">[4]</a>
     for LLVM repo is published on a coverage bot.
     Top-level indexing causes rendering scalability issues in large projects,
     such as Fuchsia <a href="https://fuchsia.dev">[5]</a>.
     The goal of this project is to generate a hierarchical directory structure in generated coverage html reports
     to match the directory structure and solve scalability issues.
     Chromium uses its own post-processing tools to show a per-directory hierarchical structure for coverage results
      <a href="https://analysis.chromium.org/coverage/p/chromium">[6]</a>.
     Similarly, Lcov, which is a graphical front-end Gcov<a href="https://gcc.gnu.org/onlinedocs/gcc/Gcov.html">[7]</a>,
     provides a one-level directory structure to display coverage results <a href="https://llvm.org/reports/coverage/index.html">[8]</a>. <br>
     [1] <a href="https://clang.llvm.org/docs/SourceBasedCodeCoverage.html">Source-based code coverage</a><br>
     [2] <a href="https://llvm.org/docs/CommandGuide/llvm-profdata.html">llvm-profdata</a><br>
     [3] <a href="https://llvm.org/docs/CommandGuide/llvm-cov.html">llvm-cov</a><br>
     [4] <a href="https://lab.llvm.org/coverage/coverage-reports/index.html">LLVM coverage reports</a><br>
     [5] <a href="https://fuchsia.dev">Fuchsia</a><br>
     [6] <a href="https://analysis.chromium.org/coverage/p/chromium">Coverage summary for Chromium</a><br>
     [7] <a href="https://gcc.gnu.org/onlinedocs/gcc/Gcov.html">Gcov</a><br>
     [8] <a href="https://llvm.org/reports/coverage/index.html">Lcov coverage reports</a><br>
     [9] <a href="https://github.com/llvm/llvm-project/issues/54711">Issue #54711: Support per-directory index files for HTML coverage report</a></p>
   </p>
   <p><b>Expected result:</b> Implement a support in hierarchical directory structure in generated coverage html reports and show the usage of this feature in LLVM repo code coverage reports.</p>
   <p><b>Project size:</b> Medium or Large</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Confirmed Mentors:</b>
     <a href="https://github.com/gulfemsavrun">Gulfem Savrun Yeniceri</a>
     <a href="https://github.com/petrhosek">Petr Hosek</a>
   </p>
   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/coverage-support-a-hierarchical-directory-structure-in-generated-coverage-html-reports/68239">URL</a>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_map_value_to_src_expr">Map LLVM values to corresponding source-level expressions</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project</b>

   Developers often use compiler generated remarks and analysis reports to optimize their code. While
   compilers in general are good at including source code positions (i.e line and column numbers) in the
   generated messages, it is useful if these generated messages also include the corresponding source-level
   expressions. The approach used by the LLVM implementation is to use a small set of intrinsic functions
   to define a mapping between LLVM program objects and the source-level expressions. The goal of this
   project is to use the information included within these intrinsic functions to either generate the
   source expression corresponding to LLVM values or to propose and implement solutions to get the same if
   the existing information is insufficient. Optimizing memory accesses in a program is important for
   application performance. We specifically intend to use compiler analysis messages that report
   source-level memory accesses corresponding to the LLVM load/store instructions that inhibit compiler
   optimizations. As an example, we can use this information to report memory access dependences that
   inhibit vectorization.
   </p>

   <p><b>Project size:</b> Medium</p>

   <p><b>Difficulty:</b> Medium</p>

   <p><b>Skills:</b> Intermediate C++, familiarity with LLVM core or willingness to learn the same.</p>

   <p><b>Expected result:</b> Provide an interface which takes an LLVM value and returns a string corresponding
     to the equivalent source-level expression. We are especially interested in using this interface to map
     addresses used in load/store instructions to equivalent source-level memory references.</p>

   <p><b>Confirmed Mentors:</b>
     Satish Guggilla (satish.guggilla@intel.com)
     Karthik Senthil (karthik.senthil@intel.com)
   </p>

   <p>
   <b>Discourse:</b>
   <a href="https://discourse.llvm.org/t/map-llvm-values-to-corresponding-source-level-expressions/68450">URL</a>
   </p>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="clangir">Build and run SingleSource benchmarks using ClangIR</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project:</b><br>
     Clang codegen works by emitting LLVM IR using AST visitors. In the
     <a href="https://llvm.github.io/clangir/">ClangIR</a> project, we emit ClangIR (CIR)
     from AST visitors too (CIRGen), and then lower to (a) LLVM IR directly or, alternatively,
     (b) MLIR in-tree dialects. Lowering to LLVM is still quite immature and lacks many
     instructions, attributes and metadata support.

     ClangIR would greatly benefit from some level of parity with Clang AST →
     LLVM IR codegen quality, in both performance and build time. This is key
     for incrementally bridging correctness and performance testing, providing a
     baseline for future higher level optimizations on top of C/C++.

     A good starting point is to build and run simple benchmarks,
     measuring both generated code and build time performance. LLVM's llvm-test-suite contains scripts and
     machinery that easily allows checking correctness and collecting perf related data and its
     <a href="https://github.com/llvm/llvm-test-suite/tree/main/SingleSource">SingleSource</a>
     collection provide a set of simpler programs to build.

     In a nutshell, while working on this project the student will brigde the
     gap of CIR → LLVM lowering, and at times fix any lacking Clang AST → CIR
     support. The work is going to be done incrementally on top of SingleSource
     benchmarks, while measuring compiler build time and the performance of compiled
     programs.
   </p>

   <p><b>Skills:</b>
     Intermediate C++ programming skills; familiarity with compilers, LLVM IR,
     MLIR or Clang are a big plus, but willingness to learn is also a
     possibility.
   </p>
   <p><b>Expected result:</b>Build and run programs from the SingleSource subdirectory from the
   lvm-test-suite, collect and present results (perf and build time) against regular (upstream) clang codegen.</p>
   <p><b>Project size:</b> Large</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Confirmed Mentors:</b>
     <a href="https://github.com/bcardosolopes">Bruno Cardoso Lopes</a>
     <a href="https://github.com/lanza">Nathan Lanza</a>
   </p>
   <p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clangir-build-and-run-singlesource-benchmarks-using-clangir/68473">URL</a>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="enzyme_tblgen_extension">Move additional Enzyme Rules to Tablegen</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     Enzyme performs automatic differentiation (in the calculus sense) of LLVM programs. This enables users to use Enzyme to perform various algorithms such as back-propagation in ML or scientific simulation on existing code for any language that lowers to LLVM. The support for an increasing number of LLVM Versions (7-main), AD modes (Reverse, Forward, Forward-Vector, Reverse-Vector, Jacobian), and libraries (BLAS, OpenMP, MPI, CUDA, ROCm, ...) leads to a steadily increasing code base. In order to limit complexity and help new contributors we would like to express more parts of our core logic using LLVM Tablegen. The applicant is free to decide how to best map the program transformation abstractions within Enzyme to Tablegen.
   </p>
   <p><b>Expected results:</b>
      1. Extend the tablegen rule generation system within Enzyme to cover a new component beside of the AdjointGenerator
      <br/>
      2. Moving several existing rules to the new autogenerated system (e.g. LLVM instructions, LLVM intrinsics, MPI calls, ...
      <br/>
    </p>

   <p><b>Confirmed mentor:</b>
     <a href="https://github.com/zuseZ4">Manuel Drehwald</a>
     <a href="mailto:wmoses@mit.edu">William Moses</a>
   </p>
   <p><b>Desirable skills:</b>
     Good knowledge of C++, calculus, and LLVM and/or Clang, and/or MLIR internals. Experience with Tablegen, Enzyme or automatic differentiation would be nice, but can also be learned in the project.
   </p>
   <p><b>Project size:</b> Large</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Discourse</b> <a href="https://discourse.llvm.org/t/enzyme-move-additional-enzyme-rules-to-tablegen/69738">URL</a></p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_patch_coverage">Patch based test coverage for quick test feedback</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project</b>
     Most of the day to day tests in LLVM are regression tests executed by <a href="https://llvm.org/docs/CommandGuide/lit.html">Lit</a>, structured as source code or IR to be passed to some binary, rather than test code directly calling the code to be tested.
     This has many advantages but can make it difficult to predict which code path is executed when the compiler is invoked with a certain test input, especially for edge cases where error handling is involved.
     The goal of this project is to help developers create good test coverage for their patch and enable reviewers to verify that they have done so.
     To accomplish this we would like to introduce a tool that can be fed a patch as input, add coverage instrumentation for the affected source files, runs Lit tests, and records which test cases cause each counter to be executed.
     For each counter we can then report the number of test cases executing the counter, but perhaps more importantly we can also report the number of test cases executing the counter that are also changed in some way by the patch, since a modified line that results in the same test results isn’t properly tested, unless it’s intended to be a non-functional change.
     This can be implemented in three separate parts:

     <ol>
       <li>Adding an option to llvm-lit to emit the necessary test coverage data, divided per test case (involves setting a unique value to <a href="https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program"><code>LLVM_PROFILE_FILE</code></a> for each RUN)
       <li>New tool to process the generated coverage data and the relevant git patch, and present the results in a user friendly manner
       <li>Adding a way to non-intrusively (without changing build configurations) enable coverage instrumentation to a build. By building the project normally, touching the files changed by the patch, and rebuilding with <a href="https://github.com/llvm/llvm-project/blob/93a1fc2e18b452216be70f534da42f7702adbe1d/clang/tools/driver/driver.cpp#L79-L105"><code>CCC_OVERRIDE_OPTIONS</code></a> set to <a href="https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#compiling-with-coverage-enabled">add coverage</a> we can lower the overhead of generating and processing coverage of lines not relevant to the patch.
     </ol>

     The tooling in step 2 and 3 can be made completely agnostic of the actual test-runner, lowering the threshold for other test harnesses than Lit to implement the same functionality.
     If time permits adding this as a step in CI would also be helpful for reviewers.
   </p>

   <p><b>Project size:</b> Small or medium</p>

   <p><b>Difficulty:</b> Simple </p>

   <p><b>Skills:</b> Python for Lit, data processing and <a href="https://www.gnu.org/software/diffutils/manual/html_node/Unified-Format.html">diff</a> processing. No compiler experience necessary. </p>

   <p><b>Expected result:</b> Implement a new tool for use by the community. Developers get help finding uncovered edge cases during development, while also avoiding paranoid sprinkling of asserts or logs just to check that the code is actually executed. Reviewers can more easily check which parts of the patch are tested by each test. </p>

   <p><b>Confirmed Mentors:</b>
     <a href="https://github.com/hnrklssn">Henrik Olsson</a>
   </p>

   <p>
   <b>Discourse:</b>
   <a href="https://discourse.llvm.org/t/coverage-patch-based-test-coverage-for-quick-test-feedback/68628">URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_sectiontitle">
   <a name="gsoc22">Google Summer of Code 2022</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p>
     Google Summer of Code 2022 was very successful for LLVM project. For the
     list of accepted and completed projects, please take a look into Google
     Summer of
     Code <a href="https://summerofcode.withgoogle.com/archive/2022/organizations/llvm-compiler-infrastructure">website</a>.
   </p>

 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>LLVM</a>
 </div>
 <!-- *********************************************************************** -->

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_shared_jitlink">Implement a shared-memory based JITLinkMemoryManager for out-of-process JITting</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     Write a shared-memory based JITLinkMemoryManager.
     <br />
     LLVM’s JIT uses the JITLinkMemoryManager interface to allocate both working
     memory (where the JIT fixes up the relocatable objects produced by the
     compiler) and target memory (where the JIT’d code will reside in the target).
     JITLinkMemoryManager instances are also responsible for transporting
     fixed-up code from working memory to target memory. LLVM has an existing
     cross-process allocator that uses remote procedure calls (RPC) to allocate
     and copy bytes to the target process, however a more attractive solution
     (when the JIT and target process share the same physical memory) would be to
     use shared memory pages to avoid copies between processes.

   </p>

   <p><b>Expected results:</b>
     <ul>Implement a shared-memory based JITLinkMemoryManager:
       <li>Write generic LLVM APIs for shared memory allocation.</li>
       <li>
         Write a JITLinkMemoryManager that uses these generic APIs to allocate
         shared working-and-target memory.
       </li>
       <li>Make an extensive performance study of the approach.</li>
     </ul>

   <p><b>Confirmed Mentor:</b>
     <a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
     <a href=https://github.com/lhames>Lang Hames</a></p>

   <p><b>Desirable skills:</b> Intermediate C++; Understanding of LLVM and the
     LLVM JIT in particular; Understanding of virtual memory management APIs.
   </p>

   <p><b>Project type:</b> Large</p>

   <p><b>Discourse</b>
     <a href="https://discourse.llvm.org/t/implement-a-shared-memory-based-jitlinkmemorymanager-for-out-of-process-jitting">
       URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_build_jit_tutorial">Modernize the LLVM "Building A JIT" tutorial series</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     The LLVM BuildingAJIT tutorial series teaches readers to build their own JIT
     class from scratch using LLVM’s ORC APIs, however the tutorial chapters have
     not kept pace with recent API improvements. Bring the existing tutorial
     chapters up to speed, write up a new chapter on lazy compilation (chapter
     code already available) or write a new chapter from scratch.
   </p>

   <p><b>Expected results:</b>
     <ul>
       <li>
         Update chapter text for Chapters 1-3 -- Easy, but offers a chance to get
         up-to-speed on the APIs.
       </li>
       <li>
         Write chapter text for Chapter 4 -- Chapter code is already available,
         but no chapter text exists yet.
       </li>
       <li>
         Write a new chapter from scratch -- E.g. How to write an out-of-process
         JIT, or how to directly manipulate the JIT'd instruction stream using
         the ObjectLinkingLayer::Plugin API.
       </li>
     </ul>

   <p><b>Confirmed Mentor:</b>
     <a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
     <a href=https://github.com/lhames>Lang Hames</a></p>
   <p><b>Desirable skills:</b> Intermediate C++; Understanding of LLVM and the
     LLVM JIT in particular; Familiarity with RST (reStructed Text); Technical
     writing skills.
   </p>
   <p><b>Project type:</b> Medium</p>

   <p><b>Discourse</b>
     <a href="https://discourse.llvm.org/t/modernize-the-llvm-building-a-jit-tutorial-series">
       URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_jit_new_format">Write JITLink support for a new format/architecture</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     JITLink is LLVM’s new JIT linker API -- the low-level API that transforms
     compiler output (relocatable object files) into ready-to-execute bytes in
     memory. To do this JITLink’s generic linker algorithm needs to be
     specialized to support the target object format (COFF, ELF, MachO), and
     architecture (arm, arm64, i386, x86-64). LLVM already has mature
     implementations of JITLink for MachO/arm64 and MachO/x86-64, and a
     relatively new implementation for ELF/x86-64. Write a JITLink implementation
     for a missing target that interests you. If you choose to implement support
     for a new architecture using the ELF or MachO formats then you will be able
     to re-use the existing generic code for these formats. If you want to
     implement support for a new target using the COFF format then you will need
     to write both the generic COFF support code and the architecture support
     code for your chosen architecture.
   </p>

   <p><b>Expected results:</b>
     Write a JITLink specialization for a not-yet-supported format/architecture.
   </p>
   <p><b>Confirmed Mentor:</b>
     <a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
     <a href=https://github.com/weliveindetail>Stefan Gränitz</a>,
     <a href=https://github.com/lhames>Lang Hames</a>
   </p>
   <p><b>Desirable skills:</b> Intermediate C++; Understanding of LLVM and the
     LLVM JIT in particular; familiarity with your chosen format/architecture,
     and basic linker concepts (e.g. sections, symbols, and relocations).
   </p>
   <p><b>Project type:</b> Large</p>

   <p><b>Discourse</b>
     <a href="https://discourse.llvm.org/t/write-jitlink-support-for-a-new-format-architecture">
       URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_instrumentaion_for_compile_time">Instrumentation of Clang/LLVM for Compile Time</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     Every developer, at some point (usually while waiting for their program to
     compile), has asked "Why is it taking so long?"  This project is to seek an
     answer to this question.  There exists within LLVM, and by extension CLANG,
     a timing infrastructure that records events within the compiler.  However,
     its utilization is inconsistent and insufficient.  This can be improved by
     adding more instrumentation throughout LLVM and CLANG but one must be careful.
     Too much instrumentation, or instrumenting the wrong things, can be confusing
     and overwhelming, thus making it no more useful than not enough information.
     The trick is to find the right places to instrument and controlling the
     instrumentation.  Seeking out these key spots will take you through the
     entire compilation process, from preprocessing through to final code
     generation, and all phases between.  As you instrument the code, you will
     look at the data as you evolve it, which will further direct your search.
     You will develop new ways to control and filter the information to allow a
     better understanding of where the compiler is spending its time.  You will
     seek out and develop example test inputs that illustrate where the compiler
     can be improved, which will in turn, help direct your instrumenting and search.
     You will consider and develop ways of controlling the instrumentation to
     allow better understanding and detailed examination of phases of compilation.
     Through all of this, you will gain an understanding of how a compiler works,
     from front end processing, through the LLVM optimization pipeline, through
     to code generation.  You will see, and understand, the big picture of what
     is required to compile and optimize a C/C++ program, and in particular, how
     CLANG, LLVM and LLC accomplish these tasks.  Your mentors have a combined
     experience of approximately 25 years of compiler development and around 8
     years of experience with LLVM itself to help you on your quest.
   </p>

   <p><b>Expected results:</b>
     <ul>
       <li>Targetted expansion of the use of the existing timing infrastructure</li>
       <li>Identification of appropriate test inputs for improving compile time</li>
       <li>Identification of compile time hotspots</li>
       <li>New and improved methods of controlling the timing infrastructure</li>
     </ul>
   </p>
   <p><b>Confirmed Mentor:</b> Jamie Schmeiser, Whitney Tsang</p>
   <p><b>Desirable skills:</b> C++ programming skills; CLANG/LLVM knowledge an asset but not necessary; self motivated; curiosity; desire to learn</p>
   <p><b>Project type:</b>175 or 350 hour</p>
   <p><b>Difficulty Rating:</b>Easy - Medium</p>
   <p><b>Discourse</b>
     <a href="https://discourse.llvm.org/t/instrumentation-of-clang-llvm-for-compile-time">
       URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_mlgo_passes">Machine Learning Guided Ordering of Compiler Optimization Passes</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project</b>
   This continues the work of GSoC 2020 and <a href="https://summerofcode.withgoogle.com/archive/2021/projects/6411038932598784">2021</a>.

   Developers generally use standard optimization pipelines like -O2 and -O3 to
   optimize their code. Manually crafted heuristics are used to determine which
   optimization passes to select and how to order the execution of those passes.
   However, this process is not tailored for a particular application, or kind
   of application, as it is designed to perform “reasonably well” for any input.

   We want to improve the existing heuristics or replace the heuristics with
   machine learning-based models so that the LLVM compiler can provide a superior
   order of the passes customized per application.

   The last milestone enabled feature extraction, and started investigating training
   a policy for selecting a more appropriate pass pipeline.
   </p>
   <p><b>Project size:</b> either 175 or 350 hr.</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Skills:</b> C/C++, some compiler experience. ML experience is a bonus.</p>
   <p><b>Expected outcomes</b>: Pre-trained model selecting the most economical
     optimization pipeline, with no loss in performance; hook-up of model in LLVM;
     (re-)training tool.</p>
   <p><b>Mentors</b>
     Tarindu Jayatilaka, Mircea Trofin, Johannes Doerfert
   </p>
   <p>
   <b>Discourse</b>
   <a href="https://discourse.llvm.org/t/machine-learning-guided-ordering-of-compiler-optimization-passes/60415">URL</a>
   </p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_mlgo_loop">Learning Loop Transformation Policies</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project</b>
   This project is a continuation of last <a href="https://summerofcode.withgoogle.com/archive/2021/projects/5732097817313280">year’s</a>.
   In 2021, the project achieved its first milestone - separating correctness
   decisions from policy decisions. This opens up the possibility of replacing
   the latter with machine-learned ones.

   Rough milestones: 1) select an initial set of features and use the existing ML
   Guided Optimizations (MLGO) infra to generate training logs; 2) define a reward
   signal, computable at compile time, to guide a reinforcement learning training loop;
   3) iterate through training and refine reward/feature set
   </p>
   <p><b>Project size:</b> either 175 or 350 hr, ideally 350 hr</p>
   <p><b>Difficulty:</b> Medium/Hard</p>
   <p><b>Skills:</b> C/C++, some compiler experience. ML experience is a bonus.</p>
   <p><b>Expected outcomes</b>: policy ('advisor') interface for loop unrolling,
     with current heuristic as default implementation; set up feature extraction
     for reinforcement learning training; set up a reward metric; set up training
     algorithm, and iterate over policy training</p>

   <p><b>Mentors</b>
     Johannes Doerfert, Mircea Trofin
   </p>
   <p>
   <b>Discourse</b>
   <a href="https://discourse.llvm.org/t/learning-loop-transformation-policies/60413">URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_module_inliner">Evaluate and Expand the Module-Level Inliner</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project</b>
     LLVM's inliner is a bottom-up, strongly-connected component-level pass. This
     places limits on the order in which call sites are evaluated, which impacts
     the effectiveness of inlining.

     We now have a functional Module Inliner, as result of <a href="https://summerofcode.withgoogle.com/archive/2021/projects/5195658885070848">GSoC2021 work</a>.
     We want to call site priority schemes, effectiveness/frequency of running
     function passes after successful inlinings, interplay with the ML inline
     advisor, to name a few areas of exploration.
   </p>
   <p><b>Project size:</b> either 175 or 350 hr, ideally 350 hr, milestones allow
     for 175hr scoping</p>
   <p><b>Difficulty:</b> Medium/Hard</p>
   <p><b>Skills:</b> C/C++, some compiler experience.</p>
   <p><b>Expected outcomes</b>: Proposal and Evaluation of alternative traversal
     orders; evaluation of 'clustering' inlining decisions (inline more than one
     call site at a time); evaluation of effectiveness/frequency of function
     optimization passes after inlining
   </p>
   <p><b>Mentors</b>
     Kazu Hirata, Liqiang Tao, Mircea Trofin
   </p>
   <p>
   <b>Discourse</b>
   <a href="https://discourse.llvm.org/t/evaluate-and-expand-the-module-level-inliner/60525">URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_lto_dependency_info">Richer symbol dependency information for LTO</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     C and C++ programs are often composed of various object files produced from
     separately-compiled source files that are then linked together.
     When compiling one source file, knowledge that can be derived from the logic
     contained within the other source files would normally not be available.
     Link-time optimization, also known as LTO, is a way for optimization to be
     done using information from more than one source file.
   </p>
   <p>In LLVM, LTO is achieved by using LLVM bitcode objects as the output from
     the "compile" step and feeding those objects into the link step.
     LLVM's LTO operates in conjunction with the linker.
     The linker is invoked by the user and the linker in turn drives LLVM's LTO
     when it encounters LLVM bitcode files, getting information from LTO about
     what symbols a bitcode object defines or references.
     Information about what symbols are defined in or referenced from an object
     is necessary for the linker to perform symbol resolution, and a linker is
     normally able to extract such information from regular (non-bitcode) object
     files.
   </p>
   <p>The implied consequences of LLVM's LTO implementation
     with respect to linker GC
     (linker garbage collection) can be improved, especially for aggressive forms
     of linker GC with lazy inclusion of objects and sections.
     In particular, the symbols referenced but undefined by an LTO module are,
     to the linker, monolithic at the module level.
     At the same time, the symbols referenced but undefined by regular
     (non-LTO) objects are monolithic to LTO.
     Together, this means that the inclusion of an LTO module
     into the overall process potentially leads, in the linker's initial symbol
     resolution, to all the undefined symbols in that module being considered as
     referenced; in turn, additional artifacts (e.g., archive members) may be
     added into the resolution, which further leads to references that may
     resolve to symbols defined in LTO modules and a premature conclusion that
     the definition of these symbols are needed.
     This at least means potentially unnecessary codegen is being done for
     functions that will be garbage-collected in the end (waste of electricity
     and time).
   </p>
   <p>We acknowledge that an ideal implementation probably involves a "coroutine"
     like interaction between the linker and LTO codegen where information flows
     back and forth; however, such an endeavour is invasive to both linkers and
     to LLVM.
   </p>
   <p>We believe that by</p>
   <ul>
     <li>having the linker register, via an API to LTO, symbol reference "nodes"
     modelling the relationship between a symbol and the symbols that are
     referenced in turn from (the object file section containing) its
     linker-selected definition, and
     </li>
     <li>using that information in LTO processing,</li>
   </ul>
   <p>the LTO processing will be able to effectively identify a more accurate set
     of LTO symbols that are visible outside of the LTO unit.
     The linker merely needs to identify only exported symbols and entry points
     (such as the entry point for an executable and functions involved in
     initialization and finalization).
   </p>
   <p>Having the LLVM opt/codegen understand the dependency implications from the
     "outside world" is strictly better than the other direction: the symbols
     referred to by relocations in non-LTO code are pretty much fixed as compiled
     (whereas some references in LTO code may disappear with optimization).
   </p>

   <p><b>Expected results:</b></p>
   <ol>
     <li>Modification of the C++ LTO interface used by LLD to implement an
       interface to record the symbol reference dependency data (incorporating
       awareness of sections and comdats). This may additionally include a
       method to add LTO objects provisionally, simulating behaviours where
       linkers only add objects as needed.
     </li>
     <li>
       Modification of LTO to use new symbol reference information
       for definitions in regular objects when visiting definitions
       in the IR prior to the
       internalization pass to discover (transitive) symbol references and
       record the so-referenced symbols as being visible to regular objects.
       This may additionally include the "late" incorporation of LTO objects
       added provisionally into the merged LTO module.
     </li>
     <li>Modification of LLD (for ELF) to modify initial resolution to use the
       new interface as a replacement for setting
       <code>VisibleToRegularObj</code>
       except for entry point functions (including C++ dynamic initialization and
       finalization).
     </li>
   </ol>

   <p><b>Confirmed Mentors:</b>
     Sean Fertile,
     Hubert Tong,
     Wael Yehia
   </p>

   <p><b>Desirable skills:</b>
     Intermediate C++;
     basic linker concepts (e.g., symbols, sections, and relocations)
   </p>

   <p><b>Project size:</b> 350 hours</p>

   <p><b>Difficultly:</b> Medium/Hard</p>

   <p><b>Discourse</b>
     <a href="https://discourse.llvm.org/t/richer-symbol-dependency-information-for-lto/60335">
       URL</a>
   </p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_undef_load">Remove undef: move uninitialized memory to poison</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project</b>
     The existence of the undef value in LLVM prevents several optimizations,
     even in programs where it is not used. Therefore, we have been trying to
     move all uses of undef to poison so we can eventually remove undef from
     LLVM.<br/>
     This project focuses on uninitialized memory: right now the semantics of
     LLVM is that loading a value from uninitilized memory yields an undef value.
     This prevents, for example, SROA/mem2reg from optimizing conditional loads
     as phi(undef, %x) cannot be replaced with x, as %x might be poison.<br/>
     This project consists in devising a consistent semantics for uninitialized
     (based on existing proposals), an upgrade plan for LLVM, and implementing
     the changes in LLVM and clang.
     In clang the changes should be specific to bit-fields.<br/>
     For more information see the following
     <a href="https://github.com/llvm/llvm-project/issues/52930">discussion</a>
     and/or contact the mentor.<br/>
     Further reading:
     <a href="https://web.ist.utl.pt/nuno.lopes/pubs/llvmmem-oopsla18.pdf">introduction to LLVM's memory model</a>.
   </p>
   <p><b>Project size:</b> 350 hr</p>
   <p><b>Difficulty:</b> Medium/Hard</p>
   <p><b>Skills:</b> Intermediate C++</p>
   <p><b>Expected outcomes</b>:
     <ul>
       <li>Semantics for memory operations that removes the need for undef
         values</li>
       <li>Upgrade plan for LLVM and frontends</li>
       <li>Implementation of the proposed semantics in LLVM</li>
       <li>Implementation of auto-upgrade path for old LLVM IR files</li>
       <li>Implementation of fixes in clang to use the new IR features</li>
       <li>Benchmarking to check for regressions and/or perf improvements</li>
     </ul>
   </p>
   <p><b>Mentors:</b>
     <a href="https://web.ist.utl.pt/nuno.lopes/">Nuno Lopes</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a id="llvm_abi_export">Add API/ABI export annotations to the LLVM build</a>
 </div>
 <!-- *********************************************************************** -->
 <div class="www_text">
   <p><b>Description of the project</b>
 <p>Currently, all libraries inside LLVM export all their symbols publicly. When
 linking statically against them, the linker will remove unused symbols and this
 is not a problem.</p>

 <p>When the libraries are built as shared libraries however, the number of exported
 symbols is very large and symbols that are meant to be internal spill into the
 public ABI of the shared libLLVM.so.</p>

 <p>In this project, we’d like to change the default visibility of library symbols
 to “hidden”, add an annotation macro to LLVM and use the macro to gradually move
 the entire library in this direction. This will eventually enable building the
 shared libLLVM.so on Windows as well.<p>

 <p>In practice, this means adding -fvisibility=hidden to individual libraries and
 annotating exported symbols with the LLVM export annotation.</p>

 <p>We would like this work to be as unintrusive into other developer’s workflow as
 possible, so starting with a small internal library would be beneficial,
 e.g. one of the LLVM targets or IR passes.</p>

 <p>For further reading, there is a Discourse thread avaiable that discusses the
 idea behind this proposal:
 <a href="https://discourse.llvm.org/t/supporting-llvm-build-llvm-dylib-on-windows/58891">
   Supporting LLVM_BUILD_LLVM_DYLIB on Windows</a>
 as well as the linked Phabricator review with a patch implementing the functionality:
 <a href="https://reviews.llvm.org/D109192">⚙ D109192 [WIP/DNM] Support:
   introduce public API annotation support</a>
 None of this work has been committed yet but can be used as a starting point
 for this proposal.</p>
   </p>
   <p><b>Project size:</b> Medium</p>
   <p><b>Difficulty:</b> Easy</p>
   <p><b>Skills:</b> Build systems, CMake, LLVM</p>
   <p><b>Expected outcomes</b>:
     <ul>
       <li>Export macro implemented and commited to LLVM</li>
       <li>At least one internal target ported to the new export scheme</li>
     </ul>
   </p>
   <p><b>Mentors:</b>
     Timm Bäder, Tom Stellard
   </p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>Clang</a>
 </div>
 <!-- *********************************************************************** -->

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-template-instantiation-sugar">Extend clang AST to provide
     information for the type as written in template instantiations.</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project: </b>
     When instantiating a template, the template arguments are canonicalized
     before being substituted into the template pattern. Clang does not preserve
     type sugar when subsequently accessing members of the instantiation.

     <pre>
     std::vector&lt;std::string&gt; vs;
     int n = vs.front(); // bad diagnostic: [...] aka 'std::basic_string&lt;char&gt;' [...]

     template&lt;typename T&gt; struct Id { typedef T type; };
     Id&lt;size_t&gt;::type // just 'unsigned long', 'size_t' sugar has been lost
     </pre>

     Clang should "re-sugar" the type when performing member access on a class
     template specialization, based on the type sugar of the accessed
     specialization. The type of vs.front() should be std::string, not
     std::basic_string&lt;char, [...]&gt;.
     <br /> <br />
     Suggested design approach: add a new type node to represent template
     argument sugar, and implicitly create an instance of this node whenever a
     member of a class template specialization is accessed. When performing a
     single-step desugar of this node, lazily create the desugared representation
     by propagating the sugared template arguments onto inner type nodes (and in
     particular, replacing Subst*Parm nodes with the corresponding sugar). When
     printing the type for diagnostic purposes, use the annotated type sugar to
     print the type as originally written.
     <br /> <br />
     For good results, template argument deduction will also need to be able to
     deduce type sugar (and reconcile cases where the same type is deduced twice
     with different sugar).
   </p>

   <p><b>Expected results: </b>
     Diagnostics preserve type sugar even when accessing members of a template
     specialization. T&lt;unsigned long&gt; and T&lt;size_t&gt; are still the
     same type and the same template instantiation, but
     T&lt;unsigned long&gt;::type single-step desugars to 'unsigned long' and
     T&lt;size_t&gt;::type single-step desugars to 'size_t'.</p>

   <p><b>Confirmed Mentor:</b>
     <a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
     <a href=https://github.com/zygoloid>Richard Smith</a></p>

   <p><b>Desirable skills:</b>
     Good knowledge of clang API, clang's AST, intermediate knowledge of C++.
   </p>
   <p><b>Project type:</b> Large</p>

   <p><b>Discourse</b>
     <a href="https://discourse.llvm.org/t/clang-extend-clang-ast-to-provide-information-for-the-type-as-written-in-template-instantiations">
       URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-sa-structured-bindings">Implement support for
     C++17 structured bindings in the Clang Static Analyzer</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project: </b>
     Even though a lot of new C++ features are supported by the static analyzer
     automatically by the virtue of clang AST doing all the work under the hood,
     the C++17 "structured binding" syntax
     <pre>    auto [x, y] = ...;</pre>
     requires some extra work on the Static Analyzer side. The analyzer's transfer functions
     need to be taught about the new AST nodes, <a href="https://clang.llvm.org/doxygen/classclang_1_1BindingDecl.html">BindingDecl</a>
     and <a href="https://clang.llvm.org/doxygen/classclang_1_1DecompositionDecl.html">DecompositionDecl</a>,
     to work correctly in all <a href="https://en.cppreference.com/w/cpp/language/structured_binding">three interpretations</a>
     described by the Standard.
     <br /><br />
     Incomplete support for structured bindings is a common source of
     false positives in the uninitialized variable checker on modern C++ code,
     such as <a href="https://github.com/llvm/llvm-project/issues/42387">#42387</a>.
     <br /><br />
     It is likely that the Clang CFG also needs to be updated. Such changes in
     the CFG may improve quality of clang warnings outside of
     the Static Analyzer.
   </p>

   <p><b>Expected results: </b>
     The Static Analyzer correctly models structured binding and decomposition
     declarations. In particular, binding variables no longer appear
     uninitialized to the Static Analyzer's uninitialized variable checker.</p>

   <p><b>Confirmed Mentor:</b>
     <a href=https://github.com/haoNoQ>Artem Dergachev</a>,
     <a href=https://github.com/t-rasmud>Rashmi Mudduluru</a>,
     <a href=https://github.com/xazax-hun>Gábor Horváth</a>,
     <a href=https://github.com/Szelethus>Kristóf Umann</a>
   </p>

   <p><b>Desirable skills:</b>
     Intermediate knowledge of C++. Some familiarity with Clang AST and/or
     some static analysis background.
   </p>
   <p><b>Project size:</b> 350 hr</p>
   <p><b>Difficulty:</b> Medium/Hard</p>
   <p><b>Discourse</b>
     <a href="https://discourse.llvm.org/t/implement-support-for-c-17-structured-bindings-in-the-clang-static-analyzer/60588">
       URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-improve-diagnostics">Improve Clang Diagnostics.</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description:</b>
   Clang Diagnostics, which issues Warnings and Errors to the programmer, are a critical
   feature of the compiler. Great diagnostics can have a significant impact on the
   user experience of the compiler and increase their productivity.
   </p>

   <p>Recent improvements in GCC
   <a href="https://developers.redhat.com/blog/2018/03/15/gcc-8-usability-improvements"> [1] </a>
   <a href="https://developers.redhat.com/blog/2019/03/08/usability-improvements-in-gcc-9/"> [2] </a>
   shows that there is significant headroom to improve diagnostics
   (and user interactions in general). It would be a very impactful project
   to survey and identify all the possible improvements to clang on this
   topic and start redesigning the next generation of our diagnostics.
   </p>

   <p>
   In addition, we will also make conclusions on issues reported on the LLVM Github Issue page labeled
   with <a href="https://github.com/llvm/llvm-project/labels/clang%3Adiagnostics"> clang-diagnostics</a>
   and if they need fixing, we will prepare patches otherwise simply close them.
   </p>

   <p><b>Expected outcomes</b>:
   Diagnostics will be improved:
     <ul>
       <li>Improve diagnostic aesthetics</li>
       <li>Cover missing diagnostics</li>
       <li>Reduce false positive rate</li>
       <li>Reword diagnostics</li>
     </ul>
   </p>

   <p><b>Confirmed Mentor:</b>
     <a href=https://github.com/AaronBallman>Aaron Ballman</a>,
     <a href=https://github.com/erichkeane>Erich Keane</a>,
     <a href=https://github.com/xgupta>Shivam Gupta</a></p></p>

   <p><b>Desirable skills:</b> C++ coding experience</p>
   <p><b>Project type:</b> Large/350 hr</p>

   <p><b>Discourse</b>
     <a href="https://discourse.llvm.org/t/improve-clang-diagnostics/61521">
       URL</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>Polly</a>
 </div>
 <!-- *********************************************************************** -->

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="polly_npm">Complete switch to new pass manager</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the Project:</b>
     While the standard Polly-enabled -O1/-O2/-O3 optimization pass pipelines work fine with the <a href="https://blog.llvm.org/posts/2021-03-26-the-new-pass-manager/">New Pass Manager</a> (NPM), some parts of Polly still only works with the legacy pass manager.
     This includes some passes such as -polly-export-jscop/-polly-export-jscop, regression testing, Polly-ACC, command line options such as -polly-show, the PassInstrumentation mechanism used by e.g. -print-after-all.
     LLVM (and Clang) have moved to NPM being the default and support for the legacy pass manager is deprecated, slowly degenerates and features getting removed.
     That is, all of Polly's functionality should eventually work with the NPM as well, and be prepared for the complete removal of the legacy pass manager.
     More details about the two pass managers found <a href="https://github.com/banach-space/llvm-tutor#about-pass-managers-in-llvm">here</a>.
   </p>
   <p><b>Expected results:</b>
     The goal is to make Polly more usable with using only the NPM. Milestones, not necessarily all to be reached in this GSoC, are:
     <br/>
     1. Make all of Polly's functionality available in the NPM (or decide to deprecate/remove it)
     <br/>
     2. Better integration into the NPM (such as supporting PassInstrumentation); If the NPM turns out to be inadequate, use only a monolothic function pass.
     <br/>
     3. Replace the legacy pass manager in regression tests.
     <br/>
     4. Be ready for complete removal of the legacy pass manager in LLVM.
   </p>
   <p><b>Confirmed mentor:</b>
     <a href="https://github.com/Meinersbur">Michael Kruse</a>
   </p>
   <p><b>Desirable skills:</b>
     Understanding of the C++ template pattern used by the new pass manager (<a href="https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern">CRTP</a>, <a href="https://en.wikipedia.org/wiki/Mixin">Mixins</a>, etc).
     Familarity with how LLVM can be <a href="https://www.lurklurk.org/linkers/linkers.html">linked</a> (static, BUILD_SHARED_LIBS, and SHLIB/DYLIB) and its <a href="https://www.llvm.org/docs/WritingAnLLVMPass.html#building-pass-plugins">plugin loading machanisms</a> (static, -load and -load-pass-plugin).
     Ideally, already worked with LLVM's new pass manager.
   </p>
   <p><b>Project size:</b> Medium</p>
   <p><b>Difficulty:</b> Medium/Hard</p>
   <p><b>Discourse</b> <a href="https://discourse.llvm.org/t/61174">URL</a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>Enzyme</a>
 </div>
 <!-- *********************************************************************** -->

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="enzyme_tblgen">Move Enzyme Instruction Transformation Rules to Tablegen</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     Enzyme performs automatic differentiation (in the calculus sense) of LLVM programs. This enables users to use Enzyme to perform various algorithms such as back-propagation in ML or scientific simulation on existing code for any language that lowers to LLVM. The support for an increasing number of LLVM Versions (7-main), AD modes (Reverse, Forward, Forward-Vector, Reverse-Vector, Jacobian), and libraries (BLAS, OpenMP, MPI, CUDA, ROCm, ...) leads to a steadily increasing code base. In order to limit complexity and help new contributors we would like to express our core logic using LLVM Tablegen. The applicant is free to decide how to best map the program transformation abstractions within Enzyme to Tablegen.
   </p>
   <p><b>Expected results:</b>
      1. A working tablegen rule generation system within Enzyme
      <br/>
      2. Moving several existing rules to the new autogenerated system (e.g. LLVM instructions, LLVM intrinsics, BLAS calls, MPI calls, ...
      <br/>
    </p>

   <p><b>Confirmed mentor:</b>
     <a href="mailto:wmoses@mit.edu">William Moses</a>, Valentin Churavy
   </p>
   <p><b>Desirable skills:</b>
     Good knowledge of C++, calculus, and LLVM and/or Clang, and/or MLIR internals. Experience with Tablegen, Enzyme or automatic differentiation would be nice, but can also be learned in the project.
   </p>
   <p><b>Project size:</b> Large</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Discourse</b> <a href="https://discourse.llvm.org/t/enzyme-moving-instruction-rules-to-tablegen/61176">URL</a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="enzyme_vector">Vector Reverse-Mode Automatic Differentiation</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     Enzyme performs automatic differentiation (in the calculus sense) of LLVM programs. This enables users to use Enzyme to perform various algorithms such as back-propagation in ML or scientific simulation on existing code for any language that lowers to LLVM. Enzyme already implements forward and reverse mode automatic differentiation. Enzyme also implements vector forward mode automatic differentiation, which allows Enzyme to batch the derivative computation of several objects in a single call. The goal of this project is too extend this capability in order to perform vector reverse mode. In doing so, multiple sweeps of reverse mode automatic differentiation can be performed at the same time, reducing memory, time, and otherwise generally enabling further optimization.
   </p>
   <p><b>Expected results:</b>
      Vectorized version of reverse mode automatic differentiation
    </p>

   <p><b>Confirmed mentor:</b>
     <a href="mailto:wmoses@mit.edu">William Moses</a>, Tim Gymnich
   </p>
   <p><b>Desirable skills:</b>
      Good knowledge of C++ and some experience with LLVM API's. Experience with Enzyme or automatic differentiation would be nice, but can also be learned in the project.
   </p>
   <p><b>Project size:</b> Medium</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Discourse</b> <a href="https://discourse.llvm.org/t/enzyme-vector-reverse-mode-automatic-differentiation/61177">URL</a></p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="enzyme_pm">Enable The New Pass Manager</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     Enzyme is a compiler plugin for LLVM that performs automatic differentiation (in the calculus sense) of LLVM programs. This enables users to use Enzyme to perform various algorithms such as back-propagation in ML or scientific simulation on existing code for any language that lowers to LLVM.
     <br/>
     Enzyme integrates into frontends through the use of an LLVM plugin that can be loaded into Clang, LLVM (opt), the linker (lld), libraries (HIPRtc), directly loaded (Julia), among others (Flang, Rust, etc).
     <br/>
     While using various pieces of machinery from the new pass manager internally, Enzyme does not currently automatically register its transformation passes when using the new pass manager. This creates problems for users on LLVM 13 or above, where the new pass manager is run by default and may not understand why they get linker errors from their code not being differentiated (currently they must add a flag to specify the old pass manager).
     <br/>
     The goal of this project is to enable Enzyme to be called by the new pass manager in LLVM and generally create a coherent user experience.
     <br/>
   </p>

   </p>
   <p><b>Expected results:</b>
       1. Enzyme can be called by the new pass manager
       <br/>
       2. [Optional] Additional syntactic sugar that makes it easier to use Enzyme.
       <br/>
    </p>

   <p><b>Confirmed mentor:</b>
     <a href="mailto:wmoses@mit.edu">William Moses</a>, Valentin Churavy
   </p>
   <p><b>Desirable skills:</b>
      Good knowledge of C++, and LLVM. Experience with Enzyme would be nice, but can also be learned in the project.
   </p>
   <p><b>Project size:</b> Small</p>
   <p><b>Difficulty:</b> Medium</p>
   <p><b>Discourse</b> <a href="https://discourse.llvm.org/t/enzyme-enable-the-new-pass-manager/61178">URL</a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_sectiontitle">
   <a name="gsoc21">Google Summer of Code 2021</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p>
     Welcome prospective Google Summer of Code 2021 Students! This document is your
     starting point to finding interesting and important projects for LLVM, Clang,
     and other related sub-projects. This list of projects is not only developed for
     Google Summer of Code, but open projects that really need developers to work on
     and are very beneficial for the LLVM community. </p>

   <p>We encourage you to look through this list and see which projects excite you
     and match well with your skill set. We also invite proposals not on this
     list. You must propose your idea to the LLVM community through our
     developers' mailing list (llvm-dev@lists.llvm.org or specific subproject mailing
     list). Feedback from the community is a requirement for your proposal to be
     considered and hopefully accepted.
   </p>

   <p>The LLVM project has participated in Google Summer of Code for several years
     and has had some very successful projects. We hope that this year is no
     different and look forward to hearing your proposals. For information on how to
     submit a proposal, please visit the Google Summer of Code
     main <a href="https://developers.google.com/open-source/gsoc/">website.</a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>LLVM</a>
 </div>
 <!-- *********************************************************************** -->

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_distributing_lit">Distributed lit testing</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     The LLVM lit test suites consist of thousands of small independent tests.
     Due to the number of tests, it can take a long time to run the full suite,
     even on a high-spec computer. Builds are already distributable across
     multiple computers available on the same network, using software such as
     distcc or icecream, so running tests on a single machine becomes a potential
     bottleneck. One way to speed up running of the tests could be to distribute
     test execution across many computers too. Lit provides a test sharding
     mechanism, which allows multiple computers to run parts of the same
     testsuite in tandem, but this currently assumes access to a single common
     filesystem, which may not be possible in all cases and a knowledge of which
     machines the suite can currently be run on.

     This project’s goal is to update the existing lit harness (or write a
     wrapper around it) to allow distribution of the tests in this way, with the
     idea that developers can write their own interface between the harness and
     the distribution system of their choice. This harness may need to be able to
     identify test dependencies such as input files and executables, send the
     tests to the distribution system (possibly in batches), and receive, collate
     and report the results to the user, in a similar manner to how lit already
     does.
   </p>

   <p><b>Expected results:</b> An easy to use harness as described above. Some
     evidence that given a distributed system, a user can expect to see test
     suite execution to speed up if they are using that harness.</p>

   <p><b>Confirmed mentor:</b> James Henderson</p>
   <p><b>Desirable skills:</b> Good knowledge of Python. Familiarity with LLVM
     lit testing. Some knowledge of distribution systems would also be
     beneficial.</p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_loop_heuristics">Learning Loop Transformation Heuristics</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     This is a short description, please reach out to Johannes (jdoerfert on
     IRC) and Mircea Trofin if it sounds interesting.

     We successfully introduced an ML framework for inliner decisions, now we want
     to expand the scope. In this project we will look at loop transformation
     heuristics, such as the unroll factor. As a motivational example we can look
     at a small trip count <a href="https://godbolt.org/z/Eeqcvs">dgemm</a> which
     we optimize pretty poorly. With the nounroll pragmas we do a better job but
     still not close to gcc.

     The project is open-ended and we could look at various passes/heuristics
     concurrently.
   </p>

   <p><b>Preparation resources:</b> The ML inliner framework in the LLVM code
   base as well as the <a href="https://arxiv.org/abs/2101.04808">paper</a>. LLVM
   transform passes (that are based on heuristics), e.g., loop unroll.</p>

   <p><b>Expected results:</b> Measurable better performance with a learned
   predictor, potentially a set of "classical" heuristics derived from the ML
   model.</p>

   <p><b>Confirmed Mentor:</b> Johannes Doerfert, Mircea Trofin</p>
   <p><b>Desirable skills:</b> Intermediate knowledge of ML, C++, self motivation.</p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_ir_fuzzing">Fuzzing LLVM-IR Passes</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     This is a short description, please reach out to Johannes (jdoerfert on
     IRC) if it sounds interesting.

     Fuzzing often reveals a myriad of bugs. CSmith (and others) showed how to do
     this with C-like languages and we have used <a
     href="https://www.youtube.com/watch?v=UBbQ_s6hNgg">LLVM-IR fuzzing</a> in
     the past successfully. In this project we will apply fuzzing to new passes
     that are in development, e.g., the Attributor pass. We want to find and fix
     crashes but also other bugs, including compile time performance problems.
   </p>

   <p><b>Preparation resources:</b> The <a
     href="https://llvm.org/docs/FuzzingLLVM.html#llvm-opt-fuzzer">LLVM fuzzer
     infrastructure</a>. LLVM passes that we might want to fuzz, e.g. the
   Attributor pass. Prior IR-Fuzzing work
   (https://www.youtube.com/watch?v=UBbQ_s6hNgg)</p>


   <p><b>Expected results:</b> Crashes, maybe also a way to catch non-crash
   bugs, including performance problems.</p>

   <p><b>Confirmed Mentor:</b> Johannes Doerfert</p>
   <p><b>Desirable skills:</b> Intermediate knowledge C++, self motivation.</p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_ir_assume"><tt>llvm.assume</tt> the missing pieces</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     This is a short description, please reach out to Johannes (jdoerfert on
     IRC) if it sounds interesting.

     <tt>llvm.assume</tt> is a powerful mechanism to retain knowledge. Since it
     inception it was improved already multiple times but there are major
     extensions still outstanding which we want to tackled in this project.
     An incomplete list of topics includes:
     <ul>
       <li> range-based assumptions, design idea 3) in the <a href="https://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html">RFC</a>. </li>
       <li> outline arbitrary assumption/assertion code, design idea 2) in the <a href="https://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html">RFC</a>. </li>
       <li> side-effect free assumptions, see <a href="https://reviews.llvm.org/D89054">this review</a>. </li>
       <li> more knowledge retention usages </li>
       <li> less interference with optimizations </li>
     </ul>

   </p>

   <p><b>Preparation resources:</b> The llvm.assumption usage, the assumption
   cache, the "enable-knowledge-retention" option, the <a
   href="https://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html">RFC</a>
   and <a href="https://reviews.llvm.org/D89054">this review</a>.
   </p>


   <p><b>Expected results:</b> New llvm.assume use cases, improved performance through knowledge retention, optimization based on assertions.</p>

   <p><b>Confirmed Mentor:</b> Johannes Doerfert</p>
   <p><b>Desirable skills:</b> Intermediate knowledge C++, self motivation.</p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_ir_issues">Fix fundamental issues in LLVM's IR</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     LLVM's IR has fundamental, long-standing issues. Many are related with
     undefined behaviors. Others are simply a fallout from underspecification
     and different interpretations by diffferent people.
     <a href="https://github.com/AliveToolkit/alive2">Alive2</a> is a tool that
     detects bugs in LLVM's optimizations automatically. Using Alive2, we track
     bugs exposed by the unit tests on a
     <a href="https://web.ist.utl.pt/nuno.lopes/alive2/">dashboard</a>.
   </p>

   <p><b>Expected results:</b>
     1) Report and fix bugs detected by Alive2.
     2) Pick one fundamental IR issue and
     make progress towards fixing it, including proposing fixes for the
     <a href="https://llvm.org/docs/LangRef.html">semantics</a>, testing
     fixes to the semantics by running Alive2 over the LLVM unit tests and
     medium-sized programs, test performance of semantic fixes and fix
     performance regressions.
   </p>
   <p><b>Confirmed Mentor:</b> Nuno Lopes, Juneyoung Lee</p>
   <p><b>Desirable skills:</b> Intermediate C++; willingness to learn about LLVM
     IR semantics; experience reading papers (preferred).
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_utilize_loopnest">Utilize LoopNest Pass</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     The idea of LoopNest pass is recently added, and there are no existing
     passes utilizing it. Before having LoopNest pass, if you want to write a
     pass that works on a loop nest, you have to pick from either a function
     pass or a loop pass. If you chose to write it as a function pass, then you
     lose the ability to add loops dynamically back to the pipeline. If you
     decide to write it as a loop pass, then you are wasting compile time to
     traverse to your pass and return right away when the given loop is not the
     outermost loop. In this project, we want to utilize the recently introduced
     LoopNest pass for passes intended for loop nest and have the same ability
     as the LoopPass to dynamically add loops to the pipeline. In addition,
     improve the current implementation of LoopNestPass when necessary.
   </p>
   <p><b>Expected results (possibilities):</b>
     Utilize LoopNest Pass for some existing transformations/analyses.
   </p>
   <p><b>Confirmed Mentors:</b>
     Whitney Tsang, Ettore Tiotto
   </p>
   <p><b>Desirable skills:</b>
     Intermediate knowledge of C++, self-motivation.
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="openmp_gpu_jit">JIT-ing OpenMP GPU kernels transparently</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     This is a short description, please reach out to Johannes (jdoerfert on
     IRC) if it sounds interesting.

     OpenMP GPU kernels are usually lowered to native binaries, e.g., cubin, and
     embedded into the host object. At runtime, OpenMP "plugins" will connect with
     the device driver, e.g., CUDA, to load and run such embedded binary images.
     In this project we want to develop a new plugin that takes LLVM-IR code, optimizes
     the IR with kernel parameters known only at runtime, and then generates the GPU
     binary for consumption by other plugins. Similar to the <a
       href=https://openmp.llvm.org/docs/design/Runtimes.html#remote-offloading-plugin>remote
       offload plugin</a> we can do this transparently to the user. In addition to the JIT
     infrastructure setup in the plugin we will need to embed the IR into the host object.

   </p>

   <p><b>Preparation resources:</b>OpenMP target offloading infrastructure, LLVM JIT infrastructure</a>.
   </p>


   <p><b>Expected results:</b> A JIT-capable offload plugin which can achieve superior performance when kernel specialization is enabling optimizations.</p>

   <p><b>Confirmed Mentor:</b> Johannes Doerfert</p>
   <p><b>Desirable skills:</b> Intermediate knowledge C++, JIT compilation, self motivation.</p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>OpenACC</a>
 </div>
 <!-- *********************************************************************** -->

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="openacc_rt_diagnostics">OpenACC Diagnostics from the OpenMP Runtime</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     Clacc and Flacc are projects to introduce OpenACC support to Clang and
     Flang.  For that purpose, OpenACC runtime support is being developed on top
     of LLVM's OpenMP runtime.  However, diagnostics emitted by LLVM's OpenMP
     runtime are expressed in terms of OpenMP concepts, and so those diagnostics
     are not always meaningful to OpenACC users.  This project should address
     this issue in two steps:
     <ol>
       <li>
         Develop a mechanism that selects OpenACC versions of diagnostics that
         are emitted as a result of OpenACC-related calls into the runtime.  This
         mechanism should be general enough that it could be used for programming
         languages besides OpenMP and OpenACC.  One possible approach is to
         extend internationalization mechanisms already present in some
         components of the OpenMP runtime.
       </li>
       <li>
         Provide OpenACC translations for existing OpenMP diagnostics.  This step
         requires an understanding of the relationship between OpenACC and OpenMP
         as implemented in Clacc and Flacc.
       </li>
     </ol>
     Many components of OpenACC support that will depend upon this project have
     not yet been upstreamed and are under development.  A high-level
     understanding of those efforts is helpful for this project and can be
     provided by the mentors.  Nevertheless, this project can be completed in
     upstream LLVM's OpenMP runtime now independently of those efforts.
   </p>

   <p>
     <b>Expected results:</b> A version of upstream LLVM's OpenMP runtime that
     can emit OpenACC diagnostics as needed.
   </p>

   <p><b>Confirmed Mentors:</b> Valentin Clement, Joel E. Denny</p>

   <p>
     <b>Desirable skills:</b> Intermediate C++; Experience with OpenACC or
     OpenMP
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>Polly</a>
 </div>
 <!-- *********************************************************************** -->

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="polly_isl_bindings">Use official isl C++ bindings</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     Polly use algorithms from the
     <a href="http://isl.gforge.inria.fr/">Integer Set Library (isl)</a>, which is a
     library written in C. It uses reference-counting for memory management.
     Getting reference counting correct is much easier in C++ using RAII,
     therefore we created a C++ binding for isl:
     <a href="https://github.com/llvm/llvm-project/blob/main/polly/lib/External/isl/include/isl/isl-noexceptions.h">isl-noexceptions.h</a>.
     Since then, isl also gained two official C++ bindings,
     <a href="https://github.com/llvm/llvm-project/blob/main/polly/lib/External/isl/include/isl/cpp.h">cpp.h</a>
     and
     <a href="https://github.com/llvm/llvm-project/blob/main/polly/lib/External/isl/include/isl/cpp-checked.h">cpp-checked.h</a>.

     We would like to replace the Polly-maintained C++ bindings with the upstream
     bindings. Unfortunately, this is not an in-place replacement. Differences
     include how errors are checked, method names, which functions are
     considered as operator/constructor overloads and the set of exported functions.
     This will require changing Polly's uses of the C++ bindings and submitting
     patches to isl to export additional functionality needed by Polly.
   </p>

   <p><b>Expected results:</b>
     Reduce the differences between the Polly-maintained isl-noexceptions.h
     bindings and one of the two C++ bindings that isl supports. Due to
     isl-noexceptions.h exporting more functions and classes than the upstream
     bindings do, a complete replacement will probably be out of reach, but
     even reducing the differences will reduce the maintenance cost of Polly's
     isl-noexceptions.h.
   </p>

   <p><b>Confirmed mentor:</b> Michael Kruse</p>
   <p><b>Desirable skills:</b>
     Deep knowledge of C++, in particular RAII and move-semantics. Interest in API design. Ideally, you already wrote some library's header file.
     Experience with the isl library would be nice, but can also be learned in the project.
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>Enzyme</a>
 </div>
 <!-- *********************************************************************** -->

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="enzyme_blas">Integrate custom derivatives of BLAS, Eigen, and similar routines into Enzyme</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     <a href="https://enzyme.mit.edu/">Enzyme</a> performs automatic differentiation
     (in the calculus sense) of LLVM programs. This enables users to use Enzyme to
     perform various algorithms such as back-propagation in ML
     or scientific simulation on existing code for any language that lowers to LLVM.

     Enzyme does so by applying the chain rule to every instruction in every
     function called by the original function to be differentiated. While functional,
     this is not necessarily optimal for high-level matrix operations which may
     have algebraic properties for faster derivative computation.

     Enzyme also has a mechanism for specifying a custom gradient for a given function.
     If a custom derivative is available, Enzyme will use that rather than fallback
     to implementing its own.

     Many programs use BLAS libraries to efficiently compute matrix and tensor
     operations. This project would enable high-performance automatic differentiation
     of BLAS and similar libraries (such as Eigen) by specifying custom derivative
     rules for their operations.
   </p>

   <p><b>Expected results:</b>
     Efficient differentiation of BLAS and Eigen codes by writing custom
     derivative rules for matrix and tensor operations.
   </p>

   <p><b>Confirmed mentor:</b> <a href="mailto:wmoses@mit.edu">William Moses</a>, Johannes Doerfert</p>
   <p><b>Desirable skills:</b>
     Good knowledge of C++, calculus, and linear algebra. Experience with BLAS, Eigen,
     or Enzyme would be nice, but can also be learned in the project.
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="enzyme_swift">Integrate Enzyme into Swift to provide high-performance differentiation in Swift</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     <a href="https://enzyme.mit.edu/">Enzyme</a> performs automatic differentiation
     (in the calculus sense) of LLVM programs. This enables users to use Enzyme to
     perform various algorithms such as back-propagation in ML
     or scientific simulation on existing code for any language that lowers to LLVM.

     While this functions for any frontend that emits LLVM IR, it may be desirable
     to have closer integration between Enzyme and the frontend for the sake of
     passing additional information and creating a better user experience.

     Swift provides automatic differentiation through the use of specifying custom
     derivative rules in the front-end. Enzyme could be integrated directly with
     Swift, differentiating the eventual LLVM, but it would lose out on all this
     additional information about custom derivatives. Moreover, calling into
     Enzyme naiively would be without Type checking, fine AD-specific debug information,
     or various other nice tools that Swift provides users of AD.

     This project would seek to integrate Enzyme and the Swift front end to provide
     both a nice user-experience for swift programmers who want to use Enzyme
     to enable high-performance automatic differentiation, and also to allow Enzyme
     to take advantage of derivative-specific metadata already available in swift.
   </p>

   <p><b>Expected results:</b>
     Creation of a custom type-checked linguistic construct in Swift for calling Enzyme.
     Mechanisms for passing Swift's differentiation-specific metadata for use by Enzyme.
   </p>

   <p><b>Confirmed mentor:</b> <a href="mailto:wmoses@mit.edu">William Moses</a>, Vassil Vassilev</p>
   <p><b>Desirable skills:</b>
     Good knowledge of C++ and Swift. Experience with Enzyme or automatic differentiation
     would be nice, but can also be learned in the project.
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="enzyme_fixed">Differentiation of Fixed-Point Arithmetic</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     <a href="https://enzyme.mit.edu/">Enzyme</a> performs automatic differentiation
     (in the calculus sense) of LLVM programs. This enables users to use Enzyme to
     perform various algorithms such as back-propagation in ML
     or scientific simulation on existing code for any language that lowers to LLVM.

     In a variety of fields, it is desirable to compute on fixed-point values
     (e.g. integers) rather than floating point values. This avoid certain truncation
     errors that may be critical to a given application. Moreover, particular pieces
     of hardware may simply be more efficient on fixed point rather than floating
     point values.

     This project would seek to extend Enzyme to support differentiation of not only
     floating point base values, but also fixed point base values..
   </p>

   <p><b>Expected results:</b>
     Implementation of adjoints for LLVM fixed point intrinsics, requisite type analysis
     rules, and integration into a front-end for an end-to-end test.
   </p>

   <p><b>Confirmed mentor:</b> <a href="mailto:wmoses@mit.edu">William Moses</a></p>
   <p><b>Desirable skills:</b>
     Good knowledge of C++, caclulus, and LLVM internals. Experience with Enzyme or
     automatic differentiation would be nice, but can also be learned in the project.
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="enzyme_rust">Integrate Enzyme into Rust to provide high-performance differentiation in Rust</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     <a href="https://enzyme.mit.edu/">Enzyme</a> performs automatic differentiation
     (in the calculus sense) of LLVM programs. This enables users to use Enzyme to
     perform various algorithms such as back-propagation in ML
     or scientific simulation on existing code for any language that lowers to LLVM.

     While this functions for any frontend that emits LLVM IR, it may be desirable
     to have closer integration between Enzyme and the frontend for the sake of
     passing additional information and creating a better user experience.

     This project would seek to integrate Enzyme and the Rust front end to provide
     a nice user-experience for Rust programmers who want to use Enzyme
     to enable high-performance automatic differentiation. This also potentially
     involves integration of LLVM plugin support/custom codegen into rustc.
   </p>

   <p><b>Expected results:</b>
     Creation of a custom type-checked linguistic construct in Rust for calling Enzyme.
     Mechanisms for parsing Rust's Type information (represented as debug LLVM debug
     info) directly into type analysis.
   </p>

   <p><b>Confirmed mentor:</b> <a href="mailto:wmoses@mit.edu">William Moses</a></p>
   <p><b>Desirable skills:</b>
     Good knowledge of C++ and Rust. Experience with Enzyme or automatic differentiation
     would be nice, but can also be learned in the project.
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="static_analyzer_profling">Clang Static Analyzer performance profiling</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     <ul>
      <li>
        Chart how much time is spent in transfer functions – including (but not
        limited to!) checker callbacks.
      </li>
      <li>
        Add llvm Statistics and Timers for quickly obtaining precise and concise
        dumps without external profilers. Statistics on state splits might be
        particularly interesting!
      </li>
      <li>
        Measure time spent analyzing specific stack frames. Say, how much time
        do we spend inlining <tt>std::string</tt> methods? This time could be
        saved if we add custom  models for these methods instead.
      </li>
     </ul>
   </p>

   <p><b>Confirmed mentor:</b>  Artem Dergachev</p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="static_analyzer_constraint_solver">Clang Static Analyzer constraint solver improvements</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     CSA has a small in-house constraint solver, it is pretty trivial, but super
     fast.  The goal is to support range-based logic for some of the symbolic
     operators, while keeping it linear.  Additionally, a unit-test framework
     can be designed specifically for testing constraint solvers (right now it’s
     tested rather awkwardly). This project has a couple of interesting
     properties.  It can be segmented into small chunks, and each of these
     chunks has a non-trivial solution.  It might introduce you to a world of
     solvers (it is a good idea to check your ideas with some heavy-weight
     solver such as z3). And because the existing solver is simple, there is a
     myriad of possible extensions to try.
   </p>

   <p><b>Confirmed mentor:</b>  Valeriy Savchenko</p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="lldb_diagnostics">A structured approach to diagnostics in LLDB</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     <ul>
      <li>
        Design and integrate a new diagnostic abstraction (similar to
        clang::Diagnostic) to report errors, warnings and notes in a structured
        way.
      </li>
      <li>
        Allow us to differentiate between bugs (unexpected errors) and things
        the debugger simply doesn’t know (expected errors). Be smart about
        printing global errors only once. Have the ability of being verbose and
        have additional metadata (source location, DWARF unit, object file, etc,
        depending on the type of error and where it originated). </li>
      <li>
        Should be compatible and tightly integrated with the existing classes,
        such as the Status and CommandReturnObject.
      </li>
     </ul>
   </p>

   <p><b>Confirmed mentor:</b>
     <a href="mailto:teemperor@gmail.com,jonas@devlieghere.com?subject=[GSoC]%20LLDB%20Diagnostics">Jonas Devlieghere and Raphael Isemann</a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_sectiontitle">
   <a name="gsoc20">Google Summer of Code 2020</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p>
     LLVM participation in Google Summer of Code 2020 was very successful and resulted
     in many interesting projects contributed to LLVM. For the list of accepted and
     completed projects, please take a look into Google Summer of Code
     <a href="https://summerofcode.withgoogle.com/archive/2020/organizations/5902726635978752/">
       website.</a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>LLVM</a>
 </div>
 <!-- *********************************************************************** -->

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_ipo">Improve inter-procedural analyses and optimizations</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     This is a short description, please reach out to Johannes (jdoerfert on IRC)
     if it sounds interesting.

     During the GSoC'19 we build the Attributor framework to improve the
     inter-procedural capabilities of LLVM. This is useful on its own but
     especially in situations where inlining is impossible or undesirable.

     In this GSoC project we will look at capabilities not yet available in the
     Attributor and for the potential to connect the Attributor with existing
     intra- and inter-procedural optimizations.

     In this project there is a lot of freedom to determine the actual tasks but
     we will provide a pool of smaller and medium sized tasks that can be chosen
     from as well.
   </p>

   <p><b>Preparation resources:</b> The Attributor YouTube videos from the
   LLVM Developers Meeting 2019 and the recording of the IPO panel from the same
   meeting. The Attributor framework as well as other existing inter-procedural
   analyses and optimizations in LLVM.</p>

   <p><b>Expected results:</b> Measurable better IPO, especially visible in cases
                               where inlining is not an option or undesirable.</p>

   <p><b>Confirmed Mentor:</b> Johannes Doerfert</p>
   <p><b>Desirable skills:</b> Intermediate knowledge of C++, self motivation.</p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_par">Improve parallelism-aware analyses and optimizations</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     This is a short description, please reach out to Johannes (jdoerfert on IRC)
     if it sounds interesting.

     With the OpenMPOpt pass (<a href='https://reviews.llvm.org/D69930'>under
       review</a>) we started to teach the LLVM optimization pipeline about
     OpenMP parallelism encoded as OpenMP runtime calls.

     In this GSoC project we will look at capabilities not yet available in the
     OpenMPOpt pass and for the potential to connect existing intra- and
     inter-procedural optimizations, e.g. the Attributor.

     In this project there is a lot of freedom to determine the actual tasks but
     we will provide a pool of smaller and medium sized tasks that can be chosen
     from as well.
   </p>

   <p><b>Preparation resources:</b> The "Optimizing Indirections, using
   abstractions without remorse" video on YouTube from the LLVM Developers
   Meeting 2018. The paper "Compiler Optimizations for OpenMP" and "Compiler
   Optimizations For Parallel Programs" both by J. Doerfert and H. Finkel (the
   slides for these are potentially even more useful).</p>

   <p><b>Expected results:</b> Measurable better performance or program analysis
   results for parallel programs with a focus on OpenMP.</p>

   <p><b>Confirmed Mentor:</b> Johannes Doerfert</p>
   <p><b>Desirable skills:</b> Intermediate knowledge of C++, self motivation.</p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_dbg_invariant">Make LLVM passes debug info invariant</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     Generating debug information is one of the fundamental tasks a compiler
     typically fulfills. It is clear that executable generated code should not
     depend on the presence of debug information.
     <br><br>
     Unfortunately there are known cases in LLVM were code generation differs
     depending on whether debug information is enabled (`-g`) or not. These kind
     of bugs can lead to bad debug experience ranging from unexpected execution
     behaviour to the point of programs running fine in debug mode while crashing
     without debug information.
     <br><br>
     The issue has likely not a single cause but is triggered during different
     passes on different architectures. One such reason is the insertion of Call
     Frame Information (CFI) in the compiler backend during frame lowering and
     other later passes. The presence of CFI instructions seems to change
     instruction scheduling which therefore leads to different generated code.
   </p>

   <p><b>Preparation resources:</b>
     <ul>
       <li>
         <a href="https://llvm.org/PR37728">PR37728</a> is a
         meta-bug that collects several related issues of differing codegen.
       </li>
       <li>
         <a href="https://llvm.org/PR37240">PR37240</a> is a
         bug discussing the CFI issue mentioned above.
       </li>
       <li>
         The following
         <a href="http://lists.llvm.org/pipermail/llvm-dev/2019-September/135433.html">
         RFC</a> discusses some possible mitigation strategies and gives some
         background information on the CFI issue.
       </li>
     </ul>
   </p>
   <p><b>Expected results:</b>
     <ul>
     <li>
       Write some tooling based on existing scripts to automatically generate
       examples of differing codegen. This is intended as a starting task to get
       to know the existing LLVM tools, learn to read LLVM's internal outputs etc.
     </li>
     <li>
       Choose one or more (depending on the difficulty) bugs that cause codegen
       differences and try to provide patches to fix them. We would be
       particularly interested in the mentioned CFI issue but working on some of
       the other related bugs is also absolutely fine.
     </li>
     </ul>
   </p>

   <p><b>Confirmed Mentors:</b> Paul Robinson and David Tellenbach</p>

   <p><b>Desirable skills:</b>
     Intermediate knowledge of C++, some familarity with general computer
     architecture, some familarity with the x86 or Arm/AArch64 instruction set.
   </p>
 </div>

 <!-- *********************************************************************** -->

 <div class="www_subsubsection">
   <a name="llvm_mergesim">Improve MergeFunctions to incorporate MergeSimilarFunction patches and ThinLTO Support</a>
 </div>
 <!-- *********************************************************************** -->

  <div class="www_text">
   <p><b>Description of the project:</b> MergeSimilarFunctions pass is able to
     merge not just identical functions, but also functions with small differences in
     their instructions to reduce code size. It does this by inserting control flow
     and an additional argument in the merged function to account for the
     differences.

     This work was presented at
     the <a href="http://llvm.org/devmtg/2013-11/#talk3">LLVM Dev Meeting in
     2013</a> A more detailed description was published in a paper at
     <a href="http://dl.acm.org/citation.cfm?id=2597811">LCTES 2014</a>. The code
     was released to the community at the time. Meanwhile, the pass has been in
     production use at QuIC for the past few years and has been actively
     maintained internally. In order to magnify the impact of
     MergeSimilarFunctions, it has been ported to ThinLTO and the patches have
     been upstreamed (see stack of 5 patches mentioned below). But instead of
     replacing the existing MergeFunctions pass in LLVM-upstream the community
     suggested we improve the existing one with the ideas from
     MergeSimilarFunctions.  And then leverage the ThinLTO on top of that. The
     MergeSimilarFunction used in ThinLTO gives impressive code size reduction
     across a wide range of workloads and the work was presented at
     <a href="https://llvm.org/devmtg/2018-10/talk-abstracts.html#talk2">LLVM-dev
     2018</a>. The LLVM project would greatly benefit from this code size
     optimization as most embedded systems (think SmartPhones) applications are
     constrained on code-size.
   </p>
   <p><b>Preparation resources:</b>
   <ul>
     <li>
       Stack of patches:
       <ul>
         <li>
           <a href="https://reviews.llvm.org/D52896">MergeSimilarFunctions 1/n: a code size pass to merge functions with small differences</a>
         </li>
         <li>
           <a href="https://reviews.llvm.org/D52898">[Porting MergeSimilarFunctions 2/n] Changes to DataLayout</a>
         </li>
         <li>
           <a href="https://reviews.llvm.org/D52966">[Merge SImilar Function ThinLTO 3/n] Add hash code to function summary</a>
         </li>
         <li>
           <a href="https://reviews.llvm.org/D53253">[Merge SImilar Function ThinLTO 4/n] Make merge function decisions before the thin-lto stage</a>
         </li>
         <li>
           <a href="https://reviews.llvm.org/D53254">[Merge SImilar Function ThinLTO 5/n] Set up similar function to be imported</a>
         </li>
       </ul>
       The paches can be easily applied to LLVM-trunk and would give a developer a decent head start ;).
     </li>
     <li>List of llvm-dev mailing list posts on previous discussions around Merge Functions
       <ul>
         <li><a href="http://lists.llvm.org/pipermail/llvm-dev/2019-January/129835.html">Link1</li>
         <li><a href="http://lists.llvm.org/pipermail/llvm-dev/2019-March/131066.html">Link2</li>
         <li><a href="http://lists.llvm.org/pipermail/llvm-dev/2019-February/129863.html">Link3</li>
         <li><a href="http://lists.llvm.org/pipermail/llvm-dev/2019-January/129832.html">Link4</li>
       </ul>
     </li>
     <li>
       <a href="http://dl.acm.org/citation.cfm?id=2597811">The original paper: LCTES 2014</a>
     </li>
     <li>
       <a href="https://llvm.org/devmtg/2018-10/talk-abstracts.html#talk2">Video and slides of the presentation</a>
     </li>
   </ul>
   </p>
   <p><b>Expected results:</b>
     <ul>
       <li>
     Improve MergeFunctions to have feature parity with MergeSimilarFunctions.
       </li>
       <li>
     Enable MergeFunctions to ThinLTO.
       </li>
     </ul>
   </p>

   <p><b>Confirmed Mentors:</b>Aditya Kumar (hiraditya on IRC and phabricator), JF Bastien (jfb on phabricator)</p>

   <p><b>Desirable skills:</b>
     Course on compiler design, SSA Representation,
     Intermediate knowledge of C++, Familiarity with LLVM Core.
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="llvm_dwarf_yaml2obj">Add DWARF support to yaml2obj</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     LLVM provides a tool called yaml2obj which coverts a YAML document into an
     object file, for various different file formats such as ELF, COFF and
     Mach-O, along with obj2yaml which does the inverse. The tool is commonly
     used to test parts of LLVM, as YAML is often easier to use to describe an
     object file than raw assembly and more maintainable than a pre-built binary.
     DWARF is a debugging file format commonly used by LLVM. Many of the tests
     for LLVM’s DWARF emission are written in assembly, but it would be nicer to
     write them in YAML. However, yaml2obj does not properly support emission of
     DWARF sections. This project is to add functionality to yaml2obj to make
     writing test inputs for DWARF tests simpler, particularly for ELF objects.
   </p>

   <p><b>Preparation resources:</b>
     Reading up on the DWARF file format will be useful, in particular the
     standards available at http://dwarfstd.org/Download.php. Also, familiarising
     yourself with the basics of the ELF file format, as described here
     https://www.sco.com/developers/gabi/latest/contents.html, may be beneficial.
   </p>
   <p><b>Expected results:</b>
     The ability to use yaml2obj to generate DWARF sections for object files.
     Particularly important is ensuring the input YAML can be more easily
     understood than the equivalent assembly.
   </p>

   <p><b>Confirmed Mentors:</b> James Henderson</p>

   <p><b>Desirable skills:</b>
     Intermediate knowledge of C++.
   </p>
 </div>

 <!-- *********************************************************************** -->

 <div class="www_subsubsection">
   <a name="llvm_hotcold">Improve hot cold splitting to aggressively outline small blocks</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
  <p><b>Description of the project:</b> Hot Cold Splitting in LLVM is an IR level
    function splitting transformation. The goal of hot/cold splitting is to improve
    the memory locality of code and helps reduce startup working set. The splitting pass
    does this by identifying cold blocks and moving them into separate functions. Because it
    is implemented at the IR level all the back end target benefit from it.

    It is a relatively new optimization and it was recently presented at
    the <a href="https://llvm.org/devmtg/2019-10/talk-abstracts.html#tech8">LLVM Dev Meeting in
    2019</a> and the slides are <a href="https://llvm.org/devmtg/2019-10/slides/Kumar-HotColdSplitting.pdf">here</a>
    Because most of the benefit comes from outlining small blocks e.g., __assert_rtn. The goal of this project
    is to identify potential blocks via static analysis e.g., exception handling code, optimizing personality functions.

    Use cost-model to ensure outlining reduces the code size of the caller, use tail call whenever appropriate to save
    instructions.

  </p>
  <p><b>Preparation resources:</b>
  <ul>
    <li>
      <a href="http://lists.llvm.org/pipermail/llvm-dev/2019-January/129606.html">Update on hot cold splitting</a>
    </li>
    <li>
      The following two papers provide earlier work on hot cold splitting. While these papers are a good start, LLVM's
      HCS has completely different implementation in two aspects a) It is implemented at IR level and outlines basic
      blocks as function rather than naked branches. b) It is based on regions and outlines a set of basic blocks.
      <ul>
        <li>
          <a href="http://pages.cs.wisc.edu/~fischer/cs701.f05/code.positioning.pdf">Original paper on hot cold splitting by
            Pettis and Hansen.</a>Section 5 on procedure splitting is interesting one. It has nice examples ;) to help
          understand why HCS works.
        </li>
        <li>
          <a href="https://www.cs.cmu.edu/afs/cs/academic/class/15745-s07/www/papers/p80-cohn.pdf">Paper on hot cold
            splitting</a> The paper provides some details on one approach to split functions. This is helpful to get a
          different perspective and may help get new ideas.
        </li>
      </ul>
    </li>
    <li>
      <a href="https://llvm.org/devmtg/2019-10/talk-abstracts.html#tech8">Video and slides of the presentation</a>
    </li>
  </ul>
  </p>
  <p><b>Expected results:</b>
    <ul>
      <li>
        Improve Hot Cold Splitting to detect and outline cold blocks from program via static analysis or profile
        information. Use appropriate cost model to weigh benefit of HCS.
        In case compile time overhead becomes quadratic, come up with a cost model to detect when quadratic behavior
        gets triggered and bail out based on a compiler flag.
      </li>
    </ul>
  </p>

  <p><b>Confirmed Mentors:</b>Aditya Kumar (hiraditya on IRC and phabricator)</p>

  <p><b>Desirable skills:</b>
    Course on compiler design, SSA Representation,
    Intermediate knowledge of C++, Familiarity with LLVM Core.
  </p>
 </div>

 <!-- *********************************************************************** -->

 <div class="www_subsubsection">
   <a name="llvm_pass_order">Advanced Heuristics for Ordering Compiler Optimization Passes</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
  <p><b>Description of the project:</b>
 Selecting optimization passes for given application is very important but
 non-trivial problem because of the huge size of the compiler transformation
 space (incl. pass ordering). While the existing heuristics can provide high
 performance code for certain applications, they cannot easily benefit a wide
 range of application codes. The goal of the project is to learn the interplay
 between LLVM transformation passes and code structures, then improve the
 existing heuristics (or replace the heuristics with machine learning-based
 models) so that the LLVM compiler can provide a superior order of the passes
 customized per application.
  </p>
  <p><b>Expected results (possibilities):</b>
  <ul>
    <li>
 Insights about (implicit) dependences between existing passes.
    </li>
    <li>
 New pass pipelines (think -O3a, -O3b, ...) selectable by the user that tend to perform substantially better on certain kinds of programs.
    </li>
    <li>
 An improved LLVM pass heuristic or new machine learning-based models that can select
 the best order for LLVM transformation passes based on code structures.
    </li>
   </ul>
  </p>

  <p><b>Preparation resources:</b>
  <ul>
    <li>
 HERCULES: Strong Patterns towards More Intelligent Predictive Modeling, Eunjung Park; Christos Kartsaklis; John Cavazos, IEEE ICPP’14
 https://ieeexplore.ieee.org/abstract/document/6957226
    </li>

    <li>
 Predictive Modeling in a Polyhedral Optimization Space, Eunjung Park, John Cavazos, Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen & P. Sadayappan, IJPP’13
 https://link.springer.com/article/10.1007/s10766-013-0241-1
    </li>

    <li>
 Machine Learning in Compiler Optimization, Zheng Wang and Michael O’Boyle, IEEE Magazine 2018.
 https://ieeexplore.ieee.org/document/8357388
    </li>
  </ul>
  </p>

  <p><b>Confirmed Mentors:</b>EJ Park, Giorgis Georgakoudis, Johannes Doerfert</p>

  <p><b>Desirable skills:</b>
     C++, Python, experience with LLVM and learning-based prediction preferable.
  </p>
 </div>

 <!-- *********************************************************************** -->

 <div class="www_subsubsection">
   <a name="llvm_ml_scc">Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
  <p><b>Description of the project:</b>
 Current machine learning models for compiler optimization select the best
 optimization strategies for functions based on isolated per function analysis.
 In this approach, the constructed models are not aware of any relationships
 with other functions around it (callers or callees) which can be helpful to
 decide the best optimization strategies for each function. In this project, we
 want to explore the SCC (Strongly Connected Components) call graph to add
 inter-procedural features in constructing machine learning-based models to find
 the best optimization strategies per function. Moreover, we want to explore the
 case that it is helpful to group strongly related functions together and
 optimize them as a group, instead of per function.
  </p>
  <p><b>Expected results (possibilities):</b>
  <ul>
    <li>
 Improved heuristics for existing (inter-procedural) passes, e.g. to weight inlining versus function cloning based on code features.
    </li>
    <li>
 Machine learning models to select the best optimizations using code features
 and inter-procedural analysis. This model can be used for functions in
 isolation or groups of functions, e.g., CGSCCs.
    </li>
  </ul>
  </p>

  <p><b>Preparation resources:</b>
  <ul>
    <li>
 HERCULES: Strong Patterns towards More Intelligent Predictive Modeling, Eunjung Park; Christos Kartsaklis; John Cavazos, IEEE ICPP’14
 https://ieeexplore.ieee.org/abstract/document/6957226
    </li>

    <li>
 Predictive Modeling in a Polyhedral Optimization Space, Eunjung Park, John Cavazos, Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen & P. Sadayappan, IJPP’13
 https://link.springer.com/article/10.1007/s10766-013-0241-1
    </li>

    <li>
 Machine Learning in Compiler Optimization, Zheng Wang and Michael O’Boyle, IEEE Magazine 2018.
 https://ieeexplore.ieee.org/document/8357388
    </li>
  </ul>
  </p>

  <p><b>Confirmed Mentors:</b>EJ Park, Giorgis Georgakoudis, Johannes Doerfert</p>

  <p><b>Desirable skills:</b>
     C++, Python, experience with LLVM and learning-based prediction preferable.
  </p>
 </div>

 <!-- *********************************************************************** -->

 <div class="www_subsubsection">
   <a name="llvm_postdominators"></a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     There is currently no easy way to use the result of
     PostDominatorTreeAnalysis in a loop pass, as PostDominatorTreeAnalysis is a
     function analysis, and it is not included in LoopStandardAnalysisResults. If one adds
     PostDominatorTreeAnalysis in LoopStandardAnalysisResults, then all loop passes
     need to preserve it, meaning that all loop passes need to make sure the result is up to
     date. In this project, we want to modify some commonly used utilities to generate a
     list of updates, which can be consume by different updaters, e.g. DomTreeUpdater to
     update DominatorTree and PostDominatorTree, and MSSAU to update MemorySSA,
     etc, instead of only updating the DominatorTree. In additional, we want to change
     existing loop passes to preserve the PostDominatorTree. Finally, adding
     PostDominatorTree in LoopStandardAnalysisResults.
   </p>
   <p><b>Expected results (possibilities):</b>
     PostDominatorTree added in LoopStandardAnalysisResults, and
     can be used by loop passes. More common utilities change to generate list of updates
     to be easily obtained by different updaters.
   </p>
   <p><b>Confirmed Mentors:</b>
     Whitney Tsang, Ettore Tiotto, Bardia Mahjour
   </p>
   <p><b>Desirable skills:</b>
     Intermediate knowledge of C++, self-motivation.
   </p>
   <p><b>Preparation resources:</b>
     <a href="https://reviews.llvm.org/rL336163"></a>
     <a href="http://llvm.org/doxygen/classllvm_1_1DomTreeUpdater.html"></a>
     <a href="https://llvm.org/doxygen/classllvm_1_1PostDominatorTreeAnalysis.html"></a>
     <a href="http://llvm.org/doxygen/structllvm_1_1LoopStandardAnalysisResults.html"></a>
 </div>

 <!-- *********************************************************************** -->

 <div class="www_subsubsection">
   <a name="llvm_loopnest">Create LoopNest Pass</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     Currently if you want to write a pass that works on a loop
     nest, you have to pick from either a function pass or a loop pass. If you chose to write
     it as a function pass, then you lose the ability to add loops dynamically back to the
     pipeline. If you decide to write it as a loop pass, then you are wasting compile time to
     traverse to your pass and return right away when the given loop is not the outermost
     loop. In this project, we want to create a LoopNestPass, where transformations
     intended for loop nest can inherit from it, and have the same ability as the LoopPass to
     dynamically add loops to the pipeline. In addition, create all the adaptors requires to
     add loop nest passes at different points of the pass builder.
   </p>
   <p><b>Expected results (possibilities):</b>
     Transformations/Analyses can be written as LoopNestPass,
     without compromising compile time or usability.
   </p>
   <p><b>Confirmed Mentors:</b>
     Whitney Tsang, Ettore Tiotto
   </p>
   <p><b>Desirable skills:</b>
     Intermediate knowledge of C++, self-motivation.
   </p>
   <p><b>Preparation resources:</b>
     <a href="https://reviews.llvm.org/D68789"</a>
     <a href="https://llvm.org/doxygen/classllvm_1_1PassBuilder.html"</a>
   </p>
 </div>

 <!-- *********************************************************************** -->

 <div class="www_subsubsection">
   <a name="llvm_instdump">Instruction properties dumper and checker</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     TableGen is flexible and allow the end-user to define and set common properties of
     records (instructions). Every target has dozens or hundreds of such instruction
     properties. As target code evolve, the td files become more and more complicated,
     it become harder to see whether the setting of some properties is necessary, even
     correct or not. eg: whether hasSideEffects property is correctly set on all
     instructions?

     One can manually search through the TableGen-generated files; or write some
     script to run TableGen and matching the output for some specific properties, but a
     standalone utility that can dump and check instruction properties
     systematically (eg: also allow target to define some verification rules) might be
     better from a build-process-management standpoint. This can help to find quite
     some hidden bugs and hence improve the overall codegen code quality. In
     addition, the utility can be used to write regression tests for instruction
     properties, which will increase the quality and precision of LLVM's
     regression tests.
   </p>
   <p><b>Expected results (possibilities):</b>
     A standalone llvm tool or utility that can dump and check instruction properties systematically
   </p>
   <p><b>Confirmed Mentors:</b>
     Hal Finkel, Jinsong Ji , Qingshan Zhang
   </p>
   <p><b>Desirable skills:</b>
     Intermediate knowledge of C++, self-motivation.
   </p>
 </div>

 <!-- *********************************************************************** -->

 <div class="www_subsubsection">
   <a name="llvm_movecode">Unify ways to move code or check if code is safe to be moved</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
     Determining whether it is safe to move code around is
     implemented in several transformations in LLVM (e.g. canSinkOrHoistInst in LICM,
     or makeLoopInvariant in Loop). Each of these implementations may return different
     results for a given query, making code motion safety checks inconsistent and
     duplicated. On the other hand, the mechanism for doing the actual code motion is also
     different in each transformation. Code duplication causes maintenance problems and
     increases the time taken to write new transformation. In this project, we want to first
     identify all the existing ways in loop transformations (could be function or loop pass)
     to check if code is safe to move, and to move code, and create a standardize way to do
     so.
   </p>
   <p><b>Expected results (possibilities):</b>
     A standardize/superset of all the existing ways in loop
     transformations of checking if code is safe to be moved and to move <code class=""></code>
   </p>

   <p><b>Confirmed Mentors:</b>
     Whitney Tsang, Ettore Tiotto, Bardia Mahjour
   </p>
   <p><b>Desirable skills:</b>
     Intermediate knowledge of C++, self-motivation.
   </p>
   <p><b>Preparation resources:</b>
     <a href="https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/Transforms/Utils/CodeMoverUtils.h"></a>
     <a href="https://llvm.org/doxygen/LICM_8cpp_source.html"></a>
     <a href="https://llvm.org/doxygen/classllvm_1_1Loop.html"></a>
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>MLIR</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
 <p>All the items in the list of
 <a href="https://mlir.llvm.org/getting_started/openprojects/">open projects</a>
 are opened to GSOC. Feel free to propose your own ideas as well on
 <a href="https://llvm.discourse.group/c/llvm-project/mlir">Discourse</a>.
 </p></div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-sa-cplusplus-checkers">Find null smart pointer dereferences
                                         with the Static Analyzer</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project: </b>
     The Clang Static Analyzer already knows how to prevent crashes caused by
     null pointer dereference in arbitrary code, however it often "gives up"
     when the code is too complicated. In particular, implementation details
     of C++ standard classes, even simple ones such as smart pointers
     or optionals, may be too convoluted for the Analyzer to fully understand.
     Moreover, the exact behavior depends on which implementation of
     the Standard Library is used (e.g., GNU libstdc++ or LLVM's own libc++).
   </p>
   <p>
     We can enable the Analyzer to find more bugs in modern C++ code
     by teaching it explicitly about the behavior of C++ standard classes,
     and therefore skipping the whole process in which the Analyzer
     tries to understand all the implementation details on its own.
     For example, we could teach it that a default-constructed smart pointer
     is null, and any attempt to dereference it would result in a crash.
     The project would therefore consist in manually providing implementations
     for various methods of standard classes.
   </p>

   <p><b>Expected results: </b>
     We want the Static Analyzer to emit warnings when a null smart pointer
     dereference would occur in the code. For example:
     <pre>
     #include &lt;memory&gt;

     int foo(bool flag) {
       std::unique_ptr&lt;int&gt; x;  <i>// note: Default constructor produces a null unique pointer;</i>

       if (flag)                <i>// note: Assuming 'flag' is false;</i>
         return 0;              <i>// note: Taking false branch</i>

       return *x;               <i>// warning: Dereferenced smart pointer 'x' is null.</i>
     }
     </pre>
     We should be able to cover at least one class fully, for example, <tt>std::unique_ptr</tt>,
     and then see if we can generalize our results to other classes, such as <tt>std::shared_ptr</tt>
     or the C++17 <tt>std::optional</tt>.
   </p>


   <p><b>Confirmed Mentor:</b> Artem Dergachev, Gábor Horváth</p>

   <p><b>Desirable skills:</b>
     Intermediate knowledge of C++.
   </p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>LLDB</a>
 </div>
 <!-- *********************************************************************** -->

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="lldb-autosuggestions">Support autosuggestions in LLDB's command line</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project: </b> LLDB's command line offers several convenience
     features that are inspired by features of UNIX shells such as tab completions or a command history.
     One feature that is not implemented yet are 'autosuggestions'. These are suggestions
     for possible commands that the user might want to type, but unlike tab completions they
     are displayed directly behind the cursor while the user is typing a command. A good demonstration
     how this could look like are the autosuggestions implemented in <a href="https://fishshell.com">fish shell</a>.
   </p>
   <p>
     This project is about implementing autosuggestions in LLDB's editline-based command shell.
   </p>
   <p><b>Confirmed Mentor:</b>
     <a href="mailto:teemperor@gmail.com,jonas@devlieghere.com?subject=[GSoC]%20Autosuggestions">Jonas Devlieghere and Raphael Isemann</a></p>
   <p><b>Desirable skills:</b>
     Intermediate knowledge of C++.
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="lldb-more-completions">Implement the missing tab completions for LLDB's command line</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project: </b> LLDB's command line offers several convenience
     features that are inspired by features of UNIX shells such as tab completions for commands.
     These tab completions are implemented by a completion engine that is not only used by the
     command line interface of LLDB, but also by graphical interfaces for LLDB such as IDEs.

     While the tab completions in LLDB are really useful, they are currently not implemented for
     all commands and their respective arguments. This project is about implementing the remaining
     completions for the commands in LLDB which will greatly improve the user experience of LLDB.
     Improving existing completions is also part of the project.

     Note that the completions are not static list of strings but often require inspecting and
     understanding the internal state of LLDB. As LLDB commands and their tab completions cover
     all aspects of LLDB, this project offers a great way to get an overview of all the functionality
     in LLDB.
   </p>
   <p><b>Confirmed Mentor:</b><a href="mailto:teemperor@gmail.com?subject=[GSoC]%20Completions">Raphael Isemann</a></p>

   <p><b>Desirable skills:</b>
     Intermediate knowledge of C++.
   </p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="lldb-reimplement-lldb-cmdline">Reimplement LLDB's command-line commands
   using the public SB API.</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project: </b> Just as LLVM is a library to
     build compilers, LLDB is a library to build debuggers. LLDB vends
     a stable, public SB API. Due to historic reasons the LLDB command
     line interface is currently implemented on top of LLDB's private
     API and it duplicates a lot of functionality that is already
     implemented in the public API. Rewriting LLDB's command line
     interface on top of the public API would simplify the
     implementation, eliminate duplicate code, and most importantly
     reduce the testing surface.
   </p>
   <p>
     This work will also provide an opportunity to clean up the SB API
     of commands that have accrued too many overloads over time and
     convert them to make use of option classes to both gather up all
     the variants and also future-proof the APIs.
   </p>
   <p><b>Confirmed Mentor:</b>Adrian Prantl and Jim Ingham</p>

   <p><b>Desirable skills:</b>
     Intermediate knowledge of C++.
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="lldb-batch-testing">Add support for batch-testing to the LLDB
     testsuite.</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project: </b>One of the tensions in the
     testsuite is that spinning up a process and getting it to some
     point is not a cheap operation, so you'd like to do a bunch of
     tests when you get there.  But the current testsuite bails at the
     first failure, so you don't want to do many tests since the
     failure of one fails all the others. On the other hand, there are
     some individual test assertions where the failure of the assertion
     <em>should</em> cause the whole test to fail.  For example, if you
     fail to stop at a breakpoint where you want to check some variable
     values, then the whole test should fail.  But if your test then
     wants to check the value of five independent locals, it should be
     able to do all five, and then report how many of the five variable
     assertions failed. We could do this by adding <em>Start</em>
     and <em>End</em> markers for a batch of tests, do all the tests in
     the batch without failing the whole test, and then report the
     error and fail the whole test if appropriate. There might also be
     a nice way to do this in Python using scoped objects for the test
     sections.
   </p>
   <p><b>Confirmed Mentor:</b> Jim Ingham</p>

   <p><b>Desirable skills:</b>
     Intermediate knowledge of Python.
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_sectiontitle">
   <a name="gsoc19">Google Summer of Code 2019</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p>Google Summer of Code 2019 contributed a lot to the LLVM project. For the list of
     accepted and completed projects, please take a look into Google Summer of Code
     <a href="https://summerofcode.withgoogle.com/archive/2019/organizations/5682474363912192/">website.
     </a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>LLVM</a>
 </div>
 <!-- *********************************************************************** -->

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="debuginfo_codegen_mismatch">Debug Info should have no
           effect on codegen</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project:</b>
       Adding Debug Info (compiling with `clang -g`) shouldn't change the
       generated code at all. Unfortunately we have bugs. These are usually not
       too hard to fix and a good way to discover new part of the codebase!
       We suggest building object files both ways and disassembling the
           text sections, which will give cleaner diffs than comparing .s files.
   </p>

   <p><b>Expected results:</b> Reduced test cases, bug reports with analysis
           (e.g., which pass is responsible), possibly patches.</p>

   <p><b>Confirmed Mentor:</b> Paul Robinson</p>
   <p><b>Desirable skills:</b> Intermediate knowledge of C++, some familiarity
         with x86 or ARM instruction set.</p>
 </div>


 <!-- *********************************************************************** -->
 <div class="www_subsection">
   <a>Clang</a>
 </div>
 <!-- *********************************************************************** -->

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="clang-astimporter-fuzzer">Implement an ASTImporter fuzzer</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project: </b>
     Clang contains an ASTImporter which allows moving declarations and
     statements from one Clang AST to another. This is for example used for
     static analysis across translation units and in LLDB's expression
     evaluator.
   </p>
   <p>
     The current ASTImporter works as intended when moving simple C code from
     one AST to another. However, more complicated declarations such as C++'s
     OOP features and templates are not fully implemented and can cause crashes
     or invalid AST nodes. The bug reports related to these crashes are often
     filed against LLDB's expression evaluator and are rarely submited with a
     minimal reproducer. This makes improving ASTImporter a time-consuming and
     tedious task.
   </p>
   <p>
     This project is about writing a fuzzer to proactively discover these
     ASTImporter bugs and provide minimal reproducers which make understanding
     and fixing the underlying bug easier.
   </p>
   <p>
     A possible implementation of such a fuzzer and driver could look like this:

   <ul>
     <li>Generate some source code that can be imported (either fully randomly
         or based on existing source code from a user-given code corpus).</li>
     <li>Import randomly a few declarations from this AST. The AST in which
         they are imported to can already be populated with declarations.</li>
     <li>Run Clang's code generator over our imported AST.</li>
     <li>If we hit an assert during the import or CodeGen steps we probably
         found an ASTImporter bug.</li>
     <li>The fuzzer driver should now reduce the size of the source code
         until it is as small as possible and still reproduces the crash (e.g.
         by running Creduce with an automatically generated test script).</li>
     <li>The reproducer should now be stored in a format so that it can just be
         copied into Clang's regression test suite for the ASTImporter (see
         the <a href="https://github.com/llvm/llvm-project/tree/master/clang/test/Import">clang/test/Import/</a> directory).
         The reproducer must still reproduce the found bug when run as part
         of the test suite.
         </li>
   </ul>
   This is just one possible approach and students are welcome to submit their
   own ideas on how the fuzzer should operate. Approaches that allow to
   automatically verify more aspects of the imported AST (e.g. the source
   locations of AST nodes, size of RecordDecls) are encouraged. The fuzzer and
   driver should be implemented in C++ and/or Python.
   </p>
   <p><b>Confirmed Mentor:</b> Raphael Isemann, Shafik Yaghmour</p>
   <p><b>Desirable skills:</b> Intermediate knowledge of C++.</p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_subsubsection">
   <a name="improve-autocompletion">Improve shell autocompletion for Clang</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p><b>Description of the project: </b> Clang has a newly implemented autocompletion feature which details can be found at <a href="http://blog.llvm.org/2017/09/clang-bash-better-auto-completion-is.html">LLVM blog</a>. We would like to improve this by adding more flags to autocompletion, supporting more shells (currently it supports only bash) and exporting this feature to other projects such as llvm-opt. Accepted student will be working on Clang Driver, LLVM Options and shell scripts.
   </p>

   <p><b>Expected Results:</b> Autocompletion working on bash and zsh, support llvm-opt options.</p>

   <p><b>Confirmed Mentor:</b> Yuka Takahashi and Vassil Vassilev</p>

   <p><b>Desirable skills:</b>
   Intermediate knowledge of C++ and shell scripting
   </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_sectiontitle">
   <a name="gsoc18">Google Summer of Code 2018</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p>Google Summer of Code 2018 contributed a lot to the LLVM project. For the list of
   accepted and completed projects, please take a look into Google Summer of Code
   <a href="https://summerofcode.withgoogle.com/archive/2018/organizations/5263452624912384/">website.
   </a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_sectiontitle">
   <a name="gsoc17">Google Summer of Code 2017</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">
   <p>Google Summer of Code 2017 contributed a lot to the LLVM project. For the list of
   accepted and completed projects, please take a look into Google Summer of Code
   <a href="https://summerofcode.withgoogle.com/archive/2017/organizations/6215410651234304/">website.
   </a></p>
 </div>

 <!-- *********************************************************************** -->
 <div class="www_sectiontitle">
   <a name="what">What is this?</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">

 <p>This document is meant to be a sort of "big TODO list" for LLVM.  Each
 project in this document is something that would be useful for LLVM to have, and
 would also be a great way to get familiar with the system.  Some of these
 projects are small and self-contained, which may be implemented in a couple of
 days, others are larger.  Several of these projects may lead to interesting
 research projects in their own right.  In any case, we welcome all
 contributions.</p>

 <p>If you are thinking about tackling one of these projects, please send a mail
 to the <a href="http://lists.llvm.org/mailman/listinfo/llvm-dev">LLVM
 Developer's</a> mailing list, so that we know the project is being worked on.
 Additionally this is a good way to get more information about a specific project
 or to suggest other projects to add to this page.
 </p>

 <p>The projects in this page are open-ended. More specific projects are
 filed as unassigned enhancements in the <a href="https://github.com/llvm/llvm-project/issues/">
 LLVM bug tracker</a>. See the
 <a href="https://github.com/llvm/llvm-project/issues?q=is%3Aopen+is%3Aissue+no%3Aassignee">list of currently outstanding issues</a>
 if you wish to help improve LLVM.</p>

 </div>

 <!-- *********************************************************************** -->
 <div class="www_sectiontitle">
   <a name="subprojects">LLVM Subprojects: Clang and More</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">

 <p>In addition to hacking on the main LLVM project, LLVM has several subprojects,
    including Clang and others.  If you are interested in working on these, please
    see their "Open projects" page:</p>

 <ul>
 <li>The <a href="http://clang.llvm.org/OpenProjects.html">Clang Open
     Projects</a> list.</li>
 <li>The <a href="http://polly.llvm.org/projects.html">Polly Open
     Projects</a> list.</li>
 <li>The <a href="http://sva.cs.illinois.edu/projects.html">SAFECode Open
     Projects</a> list.</li>
 </ul>

 </div>

 <!-- *********************************************************************** -->
 <div class="www_sectiontitle">
   <a name="improving">Improving the current system</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">

 <p>Improvements to the current infrastructure are always very welcome and tend
 to be fairly straight-forward to implement.  Here are some of the key areas that
 can use improvement...</p>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="target-desc">Factor out target descriptions</a>
 </div>

 <div class="www_text">

 <p>Currently, both Clang and LLVM have a separate target description infrastructure,
 with some features duplicated, others "shared" (in the sense that Clang has to create
 a full LLVM target description to query specific information).</p>

 <p>This separation has grown in parallel, since in the beginning they were quite
 different and served disparate purposes. But as the compiler evolved, more and
 more features had to be shared between the two so that the compiler would behave
 properly. An example is when targets have default features on speficic configurations
 that don't have flags for. If the back-end has a different "default" behaviour
 than the front-end and the latter has no way of enforcing behaviour, it
 won't work.</p>

 <p>An alternative would be to create flags for all little quirks, but first, Clang
 is not the only front-end or tool that uses LLVM's middle/back ends, and second,
 that's what "default behaviour" is there for, so we'd be missing the point.</p>

 <p>Several ideas have been floating around to fix the Clang driver WRT recognizing
 architectures, features and so on (table-gen it, user-specific configuration files,
 etc) but none of them touch the critical issue: sharing that information with the
 back-end.</p>

 <p>Recently, the idea to factor out the target description infrastructure from
 both Clang and LLVM into its own library that both use, has been floating around.
 This would make sure that all defaults, flags and behaviour are shared, but would
 also reduce the complexity (and thus the cost of maintenance) a lot. That would
 also allow all tools (lli, llc, lld, lldb, etc) to have the same behaviour
 across the board.</p>

 <p>The main challenges are:</p>

 <ul>
   <li>To make sure the transition doesn't destroy the delicate balance on any
   target, as some defaults are implicit and, some times, unknown.</li>
   <li>To be able to migrate one target at a time, one tool at a time and still
   keep the old infrastructure intact.</li>
   <li>To make it easy for detecting target's features for both front-end and
   back-end features, and to merge both into a coherent set of properties.</li>
   <li>To provide a bridge to the new system for tools that haven't migrated,
   especially the off-the-tree ones, that will need some time (one release,
   at least) to migrate..</li>
 </ul>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="code-cleanups">Implementing Code Cleanup bugs</a>
 </div>

 <div class="www_text">

 <p>
 The <a href="https://github.com/llvm/llvm-project/issues/">LLVM bug tracker</a> occasionally
 has <a href="https://github.com/llvm/llvm-project/labels/code-cleanup">"code-cleanup" bugs</a> filed in it.
 Taking one of these and fixing it is a good way to get your feet wet in the
 LLVM code and discover how some of its components work.  Some of these include
 some major IR redesign work, which is high-impact because it can simplify a lot
 of things in the optimizer.
 </p>

 <p>
 Some specific ones that would be great to have:

 <ul>
 <li><a href="/PR10367">Fix the design of GlobalAlias to not require dest type to match source type</a></li>
 <li><a href="/PR10368">Redesign ConstantExpr's</a></li>
 <li><a href="/PR11944">Static constructors should be purged from LLVM</a></li>
 </ul>
 </p>

 <p>Additionally, there are performance improvements in LLVM that need to get
 fixed. These are marked with the <tt>slow-compile</tt> keyword. Use
 <a href="https://github.com/llvm/llvm-project/issues?q=is%3Aopen+is%3Aissue+label%3Aslow-compile">
 this LLVM bug tracker query</a>
 to find them.</p>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="llvmtest">Add programs to the llvm-test testsuite</a>
 </div>

 <div class="www_text">

 <p>
 The <a href="docs/TestingGuide.html#wholeprograms">llvm-test</a> testsuite is
 a large collection of programs we use for nightly testing of generated code
 performance, compile times, correctness, etc.  Having a large testsuite gives
 us a lot of coverage of programs and enables us to spot and improve any
 problem areas in the compiler.</p>

 <p>
 One extremely useful task, which does not require in-depth knowledge of
 compilers, would be to extend our testsuite to include <a href=
 "http://nondot.org/sabre/LLVMNotes/#benchmarks">new programs and benchmarks</a>.
 In particular, we are interested in cpu-intensive programs that have few
 library dependencies, produce some output that can be used for correctness
 testing, and that are redistributable in source form.  Many different programs
 are suitable, for example, see <a
 href="http://nondot.org/sabre/LLVMNotes/#benchmarks">this list</a> for some
 potential candidates.
 </p>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="programs">Compile programs with the LLVM Compiler</a>
 </div>

 <div class="www_text">

 <p>We are always looking for new testcases and benchmarks for use with LLVM.  In
 particular, it is useful to try compiling your favorite C source code with LLVM.
 If it doesn't compile, try to figure out why or report it to the <a
 href="http://lists.llvm.org/pipermail/llvm-bugs/">llvm-bugs</a> list.  If you
 get the program to compile, it would be extremely useful to convert the build
 system to be compatible with the LLVM Programs testsuite so that we can check it
 into SVN and the automated tester can use it to track progress of the
 compiler.</p>

 <p>When testing a code, try running it with a variety of optimizations, and with
 all the back-ends: CBE, llc, and lli.</p>

 </div>


 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="benchmark">Benchmark the LLVM compiler</a>
 </div>

 <div class="www_text">

 <p>Find benchmarks either using our <a
 href="/nightlytest/">test results</a> or on your own,
 where LLVM code generators do not produce optimal code or where another
 compiler produces better code.  Try to minimize the test case that demonstrates
 the issue.  Then, either <a href="https://github.com/llvm/llvm-project/issues/">submit a
 bug</a> with your testcase and the code that LLVM produces vs. the code that it
 <em>should</em> produce, or even better, see if you can improve the code
 generator and submit a patch.  The basic idea is that it's generally quite easy
 for us to fix performance problems if we know about them, but we generally don't
 have the resources to go finding out why performance is bad.</p>

 </div>


 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="statistics">Benchmark Statistics and Warning System</a>
 </div>

 <div class="www_text">

 <p>The <a href='http://llvm.org/perf/db_default/v4/nts/recent_activity'>
 LNT perf database</a> has some nice features like detect moving average,
 standard deviations, variations, etc. But the report page give too much emphasis
 on the individual variation (where noise can be higher than signal), eg.
 <a href='http://llvm.org/perf/db_default/v4/nts/graph?plot.0=10.341.3&highlight_run=8943'>
 this case</a>.</p>

 <p>The first part of the project would be to create an analysis tool that would
 track moving averages and report:
 <ul>
  <li>If the current result is higher/lower than the previous moving average by
      more than (configurable) S standard deviations</li>
  <li>If the current moving average is more than S standard deviations of the
      Base run</li>
  <li>If the last A moving averages are in constant increase/decrease of more
      than P percent</li>
 </ul>

 <p>The second part would be to create a web page which would show all related
 benchmarks (possibly configurable, like a dashboard) and show the basic statistics
 with red/yellow/green colour codes to show status and links to more detailed
 analysis of each benchmark.</p>

 <p>A possible third part would be to be able to automatically cross reference
 different builds, so that if you group them by architecture/compiler/number
 of CPUs, this automated tool would understand that the changes are more common
 to one particular group.</p>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="coverage">Improving Coverage Reports</a>
 </div>

 <div class="www_text">

 <p>The <a href='http://llvm.org/reports/coverage/'>
 LLVM Coverage Report</a> has a nice interface to show what source lines are
 covered by the tests, but it doesn't mentions which tests, which revision and
 what architecture is covered.</p>

 <p>A project to renovate LCOV would involve:
 <ul>
  <li>Making it run on a buildbot, so that we know what commits / architectures
      are covered</li>
  <li>Update the web page to show that information</li>
  <li>Develop a system that would report every buildbot build into the web page
      in a searchable database, like LNT</li>
 </ul>

 <p>Another idea is to enable the test suite to run all built backends, not only
    the host architecture, so that coverage report can be built in a fast machine
    and have one report per commit without needing to update the buildbots.</p>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="misc_imp">Miscellaneous Improvements</a>
 </div>

 <div class="www_text">

 <ol>

 <li>Completely rewrite bugpoint.  In addition to being a mess, bugpoint suffers
 from a number of problems where it will "lose" a bug when reducing.  It should
 be rewritten from scratch to solve these and other problems.</li>
 <li><a href="https://llvm.org/PR2116">Add support for
 transactions to the PassManager</a> for improved bugpoint.</li>
 <li><a href="https://llvm.org/PR539">Improve bugpoint to
 support running tests in parallel on MP machines</a>.</li>
 <li>Add MC assembler/disassembler and JIT support to the SPARC port.</li>
 <li>Move more optimizations out of the <tt>-instcombine</tt> pass and into
 InstructionSimplify.  The optimizations that should be moved are those that
 do not create new instructions, for example turning <tt>sub i32 %x, 0</tt>
 into <tt>%x</tt>.  Many passes use InstructionSimplify to clean up code as
 they go, so making it smarter can result in improvements all over the place.</li>
 </ol>

 </div>

 <!-- *********************************************************************** -->
 <div class="www_sectiontitle">
   <a name="new">Adding new capabilities to LLVM</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">

 <p>Sometimes creating new things is more fun than improving existing things.
 These projects tend to be more involved and perhaps require more work, but can
 also be very rewarding.</p>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="llvm_ir">Extend the LLVM intermediate representation</a>
 </div>

 <div class="www_text">

 <p>Many proposed <a href="http://nondot.org/sabre/LLVMNotes/">extensions and
 improvements to LLVM core</a> are awaiting design and implementation.</p>

 <ol>
 <li><a href="http://nondot.org/sabre/LLVMNotes/DebugInfoImprovements.txt">Improvements
 for Debug Information Generation</a></li>
 <li><a href="/PR1269">EH support for non-call exceptions</a></li>
 <li>Many ideas for feature requests are stored in LLVM bugzilla.  Search
 <a href="https://github.com/llvm/llvm-project/issues?q=is%3Aissue+is%3Aopen+label%3Anew-feature">for bugs with a "new-feature" keyword</a>.</li>
 </ol>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="pointeranalysis">Pointer and Alias Analysis</a>
 </div>

 <div class="www_text">

 <p>We have a <a href="docs/AliasAnalysis.html">strong base for development</a> of
 both pointer analysis based optimizations as well as pointer analyses
 themselves.  We want to take advantage of this:</p>

 <ol>
 <li>The globals mod/ref pass does an inexpensive bottom-up context sensitive
   alias analysis.  There are some inexpensive things that we could do to better
   capture the effects of functions that access pointer arguments.  This can be
   really important for C++ methods, which spend lots of time accessing pointers
   off 'this'.</li>

 <li>The alias analysis API supports the getModRefBehavior method, which allows
   the implementation to give details analysis of the functions. For example, we
   could implement <a href="/PR1604">full knowledge of
     printf/scanf</a> side effects, which would be useful.  This feature is in
   place but not being used for anything right now.</li>

 <li>We need some way to reason about errno.  Consider a loop like this:

 <pre>
     for ()
       x += sqrt(loopinvariant);
 </pre>

 <p>We'd like to transform this into:</p>

 <pre>
     t = sqrt(loopinvariant);
     for ()
       x += t;
 </pre>

 <p>This transformation is safe, because the value of errno isn't
 otherwise changed in the loop and the exit value of errno from the
 loop is the same.  We currently can't do this, because sqrt clobbers
 errno, so it isn't "readonly" or "readnone" and we don't have a good
 way to model this.</p>

 <p>The important part of this project is figuring out how to describe
 errno in the optimizer: each libc #defines errno to something different
 it seems.  Maybe the solution is to have a __builtin_errno_addr() or
 something and change sys headers to use it.</p>

 <li>There are lots of ways to optimize out and <a
 href="/PR452">improve handling of
 memcpy/memset</a>.</li>

 </ol>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="profileguided">Profile-Guided Optimization</a>
 </div>

 <div class="www_text">

 <p>We now have a unified infrastructure for writing profile-guided
 transformations, which will work either at offline-compile-time or in the JIT,
 but we don't have many transformations.  We would welcome new profile-guided
 transformations as well as improvements to the current profiling system.
 </p>

 <p>Ideas for profile-guided transformations:</p>

 <ol>
 <li>Superblock formation (with many optimizations)</li>
 <li>Loop unrolling/peeling</li>
 <li>Profile directed inlining</li>
 <li>Code layout</li>
 <li>...</li>
 </ol>

 <p>Improvements to the existing support:</p>

 <ol>
 <li>The current block and edge profiling code that gets inserted is very simple
 and inefficient.  Through the use of control-dependence information, many fewer
 counters could be inserted into the code.  Also, if the execution count of a
 loop is known to be a compile-time or runtime constant, all of the counters in
 the loop could be avoided.</li>

 <li>You could implement one of the "static profiling" algorithms which analyze a
 piece of code an make educated guesses about the relative execution frequencies
 of various parts of the code.</li>

 <li>You could add path profiling support, or adapt the existing LLVM path
 profiling code to work with the generic profiling interfaces.</li>
 </ol>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="compaction">Code Compaction</a>
 </div>

 <div class="www_text">
 <p>LLVM aggressively optimizes for performance, but does not yet optimize for code size.
 With a new ARM backend, there is increasing interest in using LLVM for embedded systems
 where code size is more of an issue.
 </p>

 <p>Someone interested in working on implementing code compaction in LLVM might want to read
 <a href="http://citeseer.ist.psu.edu/425696.html">this</a> article, describing using
 link-time optimizations for code size optimization.
 </p>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="xforms">New Transformations and Analyses</a>
 </div>

 <div class="www_text">

 <ol>
   <li>Implement a Loop Dependence Analysis Infrastructure<br>
     - Design some way to represent and query dep analysis</li>
   <li>Value range propagation pass</li>
   <li>More fun with loops:
     <a href="http://www.cs.ualberta.ca/~amaral/cascon/CDP04/tal.html">
       Predictive Commoning
     </a>
   </li>
   <li>Type inference (aka. devirtualization)</li>
   <li><a href="http://nondot.org/sabre/LLVMNotes/BuiltinUnreachable.txt">Value
       assertions</a> (also <a href="/PR810">PR810</a>).</li>
 </ol>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="codegen">Code Generator Improvements</a>
 </div>

 <div class="www_text">

 <ol>
 <li>Generalize target-specific backend passes that could be target-independent,
     by adding necessary target hooks and making sure all IR/MI features (such as
     register masks and predicated instructions) are properly handled. Enable these
     for other targets where doing so is demonstrably beneficial.
     For example:
       <ol><li>lib/Target/Hexagon/RDF*</li>
           <li>lib/Target/AArch64/AArch64AddressTypePromotion.cpp</li>
      </ol>
     </li>
 <li>Merge the delay slot filling logic that is duplicated into (at least)
     the Sparc and Mips backends into a single target independent pass.
      Likewise, the branch shortening logic in several targets should be merged
      together into one pass.</li>
 <li>Implement 'stack slot coloring' to allocate two frame indexes to the same
     stack offset if their live ranges don't overlap.  This can reuse a bunch of
     analysis machinery from LiveIntervals.  Making the stack smaller is good
     for cache use and very important on targets where loads have limited
     displacement like ppc, thumb, mips, sparc, etc.  This should be done as
     a pass before prolog epilog insertion.  This is now done for register
     allocator temporaries, but not for allocas.</li>
 <li>Implement 'shrink wrapping', which is the intelligent placement of callee
     saved register save/restores.  Right now PrologEpilogInsertion always saves
     every (modified) callee save reg in the prolog and restores it in the
     epilog, however, some paths through a function (e.g. an early exit) may
     not use all regs.  Sinking the save down the CFG avoids useless work on
     these paths. Work has started on this, please inquire on llvm-dev.</li>
 <li>Implement interprocedural register allocation. The CallGraphSCCPass can be
     used to implement a bottom-up analysis that will determine the *actual*
     registers clobbered by a function. Use the pass to fine tune register usage
     in callers based on *actual* registers used by the callee.</li>
 <li>Add support for 16-bit x86 assembly and real mode to the assembler and
     disassembler, for use by BIOS code. This includes both 16-bit instruction
     encodings as well as privileged instructions (lgdt, lldt, ltr, lmsw, clts,
     invd, invlpg, wbinvd, hlt, rdmsr, wrmsr, rdpmc, rdtsc) and the control and
     debug registers.
 </ol>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="misc_new">Miscellaneous Additions</a>
 </div>

 <div class="www_text">

 <ol>
 <li>Port the <a href="http://www-sop.inria.fr/mimosa/fp/Bigloo/">Bigloo</A>
 Scheme compiler, from Manuel Serrano at INRIA Sophia-Antipolis, to
 output LLVM bytecode. It seems that it can already output .NET
 bytecode, JVM bytecode, and C, so LLVM would ostensibly be another good
 candidate.</li>
 <li>Write a new frontend for some other language (Java? OCaml? Forth?)</li>
 <li>Random test vector generator: Use a C grammar to generate random C code,
 e.g., <a href="http://code.google.com/p/quest-tester/">quest</a>;
 run it through llvm-gcc, then run a random set of passes on it using opt.
 Try to crash <tt><a href="/docs/CommandGuide/html/opt.html">opt</a></tt>. When
 <tt>opt</tt> crashes, use <tt><a
 href="/docs/CommandGuide/html/bugpoint.html">bugpoint</a></tt> to reduce the
 test case and post it to a website or mailing list.  Repeat ad infinitum.</li>
 <li>Add sandbox features to the Interpreter: catch invalid memory accesses,
   potentially unsafe operations (access via arbitrary memory pointer) etc.
 </li>
 <li>Port <a href="http://valgrind.org">Valgrind</a> to use LLVM code generation
   and optimization passes instead of its own.</li>
 <li>Write LLVM IR level debugger (extend Interpreter?)</li>
 <li>Write an LLVM Superoptimizer.  It would be interesting to take ideas from
     this superoptimizer for x86:
 <a href="http://theory.stanford.edu/~aiken/publications/papers/asplos06.pdf">paper #1</a> and <a href="http://theory.stanford.edu/~sbansal/superoptimizer.html">paper #2</a> and adapt them to run on LLVM code.<p>

 It would seem that operating on LLVM code would save a lot of time
 because its semantics are much simpler than x86.  The cost of operating
 on LLVM is that target-specific tricks would be missed.<p>

 The outcome would be a new LLVM pass that subsumes at least the
 instruction combiner, and probably a few other passes as well.  Benefits
 would include not missing cases missed by the current combiner and also
 more easily adapting to changes in the LLVM IR.<p>

 All previous superoptimizers have worked on linear sequences of code.
 It would seem much better to operate on small subgraphs of the program
 dependency graph.</li>
 </ol>

 </div>

 <!-- *********************************************************************** -->
 <div class="www_sectiontitle">
   <a name="using">Projects using LLVM</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="www_text">

   <p>
   In addition to projects that enhance the existing LLVM infrastructure, there
   are projects that improve software that uses, but is not included with, the
   LLVM compiler infrastructure.  These projects include open-source software
   projects and research projects that use LLVM.  Like projects that enhance the
   core LLVM infrastructure, these projects are often challenging and rewarding.
   </p>

 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="encodeanalysis">Encode Analysis Results in MachineInstr IR</a>
 </div>

 <div class="www_text">
   <p>
   At least one project (and probably more) needs to use analysis information
   (such as call graph analysis) from within a MachineFunctionPass, however,
   most analysis passes operate at the LLVM IR level.  In some cases, a value
   (e.g., a function pointer) cannot be mapped from the MachineInstr level back
   to the LLVM IR level reliably, making the use of existing LLVM analysis
   passes from within a MachineFunctionPass impossible (or at least brittle).
   </p>

   <p>
   This project is to encode analysis information from the LLVM IR level into
   the MachineInstr IR when it is generated so that it is available to a
   MachineFunctionPass.  The exemplar is call graph analysis (useful for
   control-flow integrity instrumentation, analysis of code reuse defenses, and
   gadget compilers); however, other LLVM analyses may be useful.
   </p>
 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="codelayoutjit">Code Layout in the LLVM JIT</a>
 </div>

 <div class="www_text">
   <p>
   Implement an on-demand function relocator in the LLVM JIT. This can help
   improve code locality using runtime profiling information. The idea is to use
   a relocation table for every function.  The relocation entries need to be
   updated upon every function relocation (take a look at
   <a href="https://people.cs.umass.edu/~emery/pubs/stabilizer-asplos13.pdf">
   this article</a>).
   A (per-function) basic block reordering would be a useful extension.
   </p>
 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="fieldlayout">Improved Structure Splitting and Field Reordering</a>
 </div>

 <div class="www_text">
   <p>
   The goal of this project is to implement better data layout optimizations
   using the model of reference affinity.  This
   <a href="http://www.cs.rochester.edu/~cding/Documents/Publications/pldi04.pdf">
   paper</a>
   provides some background information.
   </p>
 </div>

 <!-- ======================================================================= -->
 <div class="www_subsubsection">
   <a name="slimmer">Finish the Slimmer Project</a>
 </div>

 <div class="www_text">
   <p>
   Slimmer is a prototype tool, built using LLVM, that uses dynamic analysis to
   find potential performance bugs in programs.  Development on Slimmer started
   during Google Summer of Code in 2015 and resulted in an initial prototype,
   but evaluation of the prototype and improvements to make it portable and
   robust are still needed.  This project would have a student pick up and
   finish the Slimmer work.  The source code of Slimmer and
   its current documentation can be found at its
   <a href="https://github.com/james0zan/Slimmer">Github</a> web page.
   </p>
 </div>

 <!-- *********************************************************************** -->

 <hr>

 <!--#include virtual="footer.incl" -->