blob: 527e65fde576446610d6e503323ef23e838b1801 [file] [log] [blame]
<!--#include virtual="header.incl" -->
<div class="www_sectiontitle">Open LLVM Projects</div>
<ul>
<li>Google Summer of Code Ideas & Projects
<ul>
<li>
<a href="#gsoc24">Google Summer of Code 2024</a>
<ul>
<li><b>LLVM Core</b>
<ul>
<li><a href="#remove_ub_tests">Remove undefined behavior from tests</a></li>
<li><a href="#spirv_tablegen">Automatically generate TableGen file for SPIR-V instruction set</a></li>
<li><a href="#bitstream_cas">LLVM bitstream integration with CAS (content-addressable storage)</a></li>
<li><a href="#three_way_comparison">Add 3-way comparison intrinsics</a></li>
<li><a href="#llvm_www">Improve the LLVM.org Website Look and Feel</a></li>
<li><a href="#parameter-tuning">The 1001 thresholds in LLVM</a></li>
</ul>
<li><a href="http://clang.llvm.org/"><b>Clang</b></a>
<ul>
<li><a href="#clang-repl-out-of-process">Out-of-process execution for clang-repl</a></li>
<li><a href="#clang-plugins-windows">Support clang plugins on Windows</a></li>
<li><a href="#clang-on-demand-parsing">On Demand Parsing in Clang</a></li>
<li><a href="#clang-doc-improve-usability">Improve Clang-Doc Usability</a></li>
</ul>
<li><a href="http://lldb.llvm.org/"><b>LLDB</b></a>
<ul>
<li><a href="#rich-disassembler-for-lldb">Rich disassembler for LLDB</a></li>
</ul>
<li><a href="http://openmp.llvm.org/"><b>(OpenMP) Offload</b></a>
<ul>
<li><a href="#gpu-delta-debugging">GPU Delta Debugging</a></li>
<li><a href="#offload-libcxx">Offloading libcxx</a></li>
<li><a href="#gpu-libc">Performance tuning the GPU libc</a></li>
<li><a href="#gpu-first">Improve GPU First Framework</a></li>
</ul>
<li><a href="https://clangir.org"><b>ClangIR</b></a>
<ul>
<li><a href="#clangir-gpu">Compile GPU kernels using ClangIR</a></li>
</ul>
<li><a href="http://libc.llvm.org/"><b>LLVM libc</b></a>
<ul>
<li><a href="#half-precision-libc">Half precision in LLVM libc</a>
</ul>
</ul>
</li>
<li>
<a href="#gsoc23">Google Summer of Code 2023</a>
<ul>
<li>
<b>LLVM Core</b>
<ul>
<li><a href="#llvm_new_jitlink_reopt">Re-optimization using JITLink</a></li>
<li><a href="#llvm_new_jitlink_backends">JITLink new backends</a></li>
<li><a href="#llvm_improving_compile_times">Improving compile times</a></li>
<li><a href="#llvm_addressing_rust_optimization_failures">Addressing Rust optimization failures</a></li>
<li><a href="#llvm_mlgo_latency_model">Better performance models for MLGO training</a></li>
<li><a href="#llvm_mlgo_passes_2023">Machine Learning Guided Ordering of Compiler Optimization Passes</a></li>
<li><a href="#llvm_map_value_to_src_expr">Map LLVM values to corresponding source-level expressions</a></li>
</ul>
</li>
<li><a href="http://clang.llvm.org/"><b>Clang</b></a>
<ul>
<li><a href="#clang-repl-out-of-process">Out-of-process execution for clang-repl</a>
<li><a href="#clang_analyzer_taint_analysis">Improve and Stabilize the Clang Static Analyzer's "Taint Analysis" Checks</a></li>
<li><a href="#clang-repl-autocompletion">Implement autocompletion in clang-repl</a>
<li><a href="#clang-modules-build-daemon">Modules build daemon: build system agnostic support for explicitly built modules</a></li>
<li><a href="#clang-extract-api-categories">ExtractAPI Objective-C categories</a></li>
<li><a href="#clang-extract-api-cpp-support">ExtractAPI C++ Support</a></li>
<li><a href="#clang-extract-api-while-building">ExtractAPI while building</a></li>
<li><a href="#clang-improve-diagnostics2">Improve Clang diagnostics</a></li>
<li><a href="#clang-tutorials-clang-repl">Tutorial development with clang-repl</a></li>
<li><a href="#clang-repl-wasm">Add WebAssembly Support in clang-repl</a></li>
</li>
</ul>
</li>
<li>
<b>LLD</b>
<ul>
<li><a href="#llvm_lld_embedded">LLD Linker Improvements for Embedded Targets</a></li>
</ul>
</li>
<li>
<b>MLIR</b>
<ul>
<li><a href="#llvm_mlir_presburger_opt">Optimizing MLIR’s Presburger library</a></li>
<li><a href="#llvm_mlir_query">Interactively query MLIR IR</a></li>
</ul>
</li>
<li>
<b>Code Coverage</b>
<ul>
<li><a href="#llvm_code_coverage">Support a hierarchical directory structure in generated coverage html reports</a></li>
<li><a href="#llvm_patch_coverage">Patch based test coverage for quick test feedback</a></li>
</ul>
</li>
<li>
<b>ClangIR</b>
<ul>
<li><a href="#clangir">Build and run SingleSource benchmarks using ClangIR</a></li>
</ul>
</li>
<li>
<b><a href="https://enzyme.mit.edu">Enzyme</a></b>
<ul>
<li><a href="#enzyme_tblgen_extension">Move additional Enzyme Rules to Tablegen</a></li>
</ul>
</li>
</ul>
</li>
<li>
<a href="#gsoc22">Google Summer of Code 2022</a>
<ul>
<li>
<b>LLVM Core</b>
<ul>
<li><a href="#llvm_shared_jitlink">Implement a shared-memory based JITLinkMemoryManager for out-of-process JITting</a></li>
<li><a href="#llvm_build_jit_tutorial">Modernize the LLVM "Building A JIT" tutorial series</a></li>
<li><a href="#llvm_jit_new_format">Write JITLink support for a new format/architecture</a></li>
<li><a href="#llvm_instrumentaion_for_compile_time">Instrumentation of Clang/LLVM for Compile Time</a></li>
<li><a href="#llvm_lto_dependency_info">Richer symbol dependency information for LTO</a></li>
<li><a href="#llvm_mlgo_passes">Machine Learning Guided Ordering of Compiler Optimization Passes</a></li>
<li><a href="#llvm_mlgo_loop">Learning Loop Transformation Heuristics</a></li>
<li><a href="#llvm_module_inliner">Evaluate and Expand the Module-Level Inliner</a></li>
<li><a href="#llvm_undef_load">Remove undef: move uninitialized memory to poison</a></li>
<li><a href="#llvm_abi_export">Add ABI/API export annotations to the LLVM build</a></li>
</ul>
</li>
<li><a href="http://clang.llvm.org/"><b>Clang</b></a>
<ul>
<li><a href="#clang-template-instantiation-sugar">Extend clang AST to
provide information for the type as written in template
instantiations</a>
</li>
<li><a href="#clang-sa-structured-bindings">Implement support for
C++17 structured bindings in the Clang Static Analyzer</a>
</li>
<li><a href="#clang-improve-diagnostics">Improve Clang Diagnostics</a>
</li>
</ul>
</li>
<li>
<a href="https://polly.llvm.org"><b>Polly</b></a>
<ul>
<li><a href="#polly_npm">Completely switch to new pass manager</a></li>
</ul>
</li>
<li>
<b><a href="https://enzyme.mit.edu">Enzyme</a></b>
<ul>
<li><a href="#enzyme_tblgen">Move Enzyme Instruction Transformation Rules to Tablegen</a></li>
<li><a href="#enzyme_vector">Vector Reverse-Mode Automatic Differentiation</a></li>
<li><a href="#enzyme_pm">Enable The New Pass Manager</a></li>
</ul>
</li>
</ul>
</li>
<li>
<a href="#gsoc21">Google Summer of Code 2021</a>
<ul>
<li>
<b>LLVM Core</b>
<ul>
<li><a href="#llvm_distributing_lit">Distributed lit testing</a></li>
<li><a href="#llvm_loop_heuristics">Learning Loop Transformation Heuristics</a></li>
<li><a href="#llvm_ir_fuzzing">Fuzzing LLVM-IR Passes</a></li>
<li><a href="#llvm_ir_assume"><tt>llvm.assume</tt> the missing pieces</a></li>
<li><a href="#llvm_shared_jitlink">Implement a shared-memory based JITLinkMemoryManager for out-of-process JITting</a></li>
<li><a href="#llvm_build_jit_tutorial">Modernize the LLVM "Building A JIT" tutorial series</a></li>
<li><a href="#llvm_jit_new_format">Write JITLink support for a new format/architecture</a></li>
<li><a href="#llvm_ir_issues">Fix fundamental issues in LLVM's IR</a></li>
<li><a href="#llvm_utilize_loopnest">Utilize LoopNest Pass</a></li>
</ul>
</li>
<li><a href="http://clang.llvm.org/"><b>Clang</b></a>
<ul>
<li><a href="#clang-template-instantiation-sugar">Extend clang AST to
provide information for the type as written in template
instantiations</a>
</li>
</ul>
</li>
<li>
<b>OpenMP</b>
<ul>
<li><a href="#openmp_gpu_jit">JIT-ing OpenMP GPU kernels transparently</a></li>
</ul>
</li>
<li>
<b>OpenACC</b>
<ul>
<li><a href="#openacc_rt_diagnostics">OpenACC Diagnostics from the OpenMP Runtime</a></li>
</ul>
</li>
<li>
<b><a href="https://polly.llvm.org">Polly</a></b>
<ul>
<li><a href="#polly_isl_bindings">Use official isl C++ bindings</a></li>
</ul>
</li>
<li>
<b><a href="https://enzyme.mit.edu">Enzyme</a></b>
<ul>
<li><a href="#enzyme_blas">Integrate custom derivatives of BLAS, Eigen, and similar routines into Enzyme</a></li>
<li><a href="#enzyme_swift">Integrate Enzyme into Swift to provide high-performance differentiation in Swift</a></li>
<li><a href="#enzyme_fixed">Differentiation of Fixed-Point Arithmetic</a></li>
<li><a href="#enzyme_rust">Integrate Enzyme into Rust to provide high-performance differentiation in Rust</a></li>
</ul>
</li>
<li>
<b>Clang Static Analyzer</b>
<ul>
<li><a href="#static_analyzer_profling">Clang Static Analyzer performance profiling</a></li>
<li><a href="#static_analyzer_constraint_solver">Clang Static Analyzer constraint solver improvements</a></li>
</ul>
</li>
<li>
<b>LLDB</b>
<ul>
<li><a href="#lldb_diagnostics">A structured approach to diagnostics in LLDB</a></li>
</ul>
</li>
</ul>
</li>
<li>
<a href="#gsoc20">Google Summer of Code 2020</a>
<ul>
<li>
<b>LLVM Core</b>
<ul>
<li><a href="#llvm_optimized_debugging">Improve debugging of optimized code</a></li>
<li><a href="#llvm_ipo">Improve inter-procedural analyses and optimizations</a></li>
<li><a href="#llvm_par">Improve parallelism-aware analyses and optimizations</a></li>
<li><a href="#llvm_dbg_invariant">Make LLVM passes debug info invariant</a></li>
<li><a href="#llvm_mergesim">Improve MergeFunctions to incorporate MergeSimilarFunction patches and ThinLTO Support</a></li>
<li><a href="#llvm_dwarf_yaml2obj">Add DWARF support to yaml2obj</a></li>
<li><a href="#llvm_hotcold">Improve hot cold splitting to aggressively outline small blocks</a></li>
<li><a href="#llvm_pass_order">Advanced Heuristics for Ordering Compiler Optimization Passes</a></li>
<li><a href="#llvm_ml_scc">Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations</a></li>
<li><a href="#llvm_postdominators">Add PostDominatorTree in LoopStandardAnalysisResults</a></li>
<li><a href="#llvm_loopnest">Create loop nest pass</a></li>
<li><a href="#llvm_instdump">Instruction properties dumper and checker</a></li>
<li><a href="#llvm_movecode">Unify ways to move code or check if code is safe to be moved</a></li>
</ul>
<li><a href="http://clang.llvm.org/"><b>Clang</b></a>
<ul>
<li><a href="#clang-template-instantiation-sugar">Extend clang AST to
provide information for the type as written in template
instantiations</a>
</li>
<li><a href="#clang-sa-cplusplus-checkers">Find null smart pointer dereferences
with the Static Analyzer</a>
</li>
</ul>
</li>
<li><a href="http://lldb.llvm.org/"><b>LLDB</b></a></li>
<ul>
<li><a href="#lldb-autosuggestions">Support autosuggestions in LLDB's command line</a></li>
<li><a href="#lldb-more-completions">Implement the missing tab completions for LLDB's command line</a></li>
<li><a href="#lldb-reimplement-lldb-cmdline">Reimplement LLDB's command-line commands using the public SB API.</a></li>
<li><a href="#lldb-batch-testing">Add support for batch-testing to the LLDB testsuite.</a></li>
</ul>
<li>
<b>MLIR</b>
<ul>
<li>See the <a href="https://mlir.llvm.org/getting_started/openprojects/">MLIR open project list</a></li>
</ul>
</li>
</ul>
</li>
<li>
<a href="#gsoc19">Google Summer of Code 2019</a>
<ul>
<li>
<b>LLVM Core</b>
<ul>
<li><a href="#debuginfo_codegen_mismatch">Debug Info should have no
effect on codegen</a></li>
<li><a href="#llvm_function_attributes">Improve (function) attribute
inference</a></li>
<li><a href="#improve_binary_utilities">Improve LLVM binary utilities
</a></li>
</ul>
</li>
<li><a href="http://clang.llvm.org/"><b>Clang</b></a>
<ul>
<li><a href="#clang-astimporter-fuzzer">Implement an ASTImporter
fuzzer</a>
</li>
<li><a href="#improve-autocompletion">Improve shell autocompletion
for Clang</a>
</li>
<li><a href="#analyze-llvm">Apply the Clang Static Analyzer to LLVM-based
Projects</a>
</li>
<li><a href="#header-generation">Generate annotated sources based on
LLVM-IR analyses</a>
</li>
</ul>
</li>
</ul>
</li>
<li><a href="#gsoc18">Google Summer of Code 2018</a></li>
<li><a href="#gsoc17">Google Summer of Code 2017</a></li>
</ul>
</li>
<li><a href="#what">What is this?</a></li>
<li><a href="#subprojects">LLVM Subprojects: Clang and more</a></li>
<li><a href="#improving">Improving the current system</a>
<ol>
<li><a href="#target-desc">Factor out target descriptions</a></li>
<li><a href="#code-cleanups">Implementing Code Cleanup bugs</a></li>
<li><a href="#programs">Compile programs with the LLVM Compiler</a></li>
<li><a href="#llvmtest">Add programs to the llvm-test suite</a></li>
<li><a href="#benchmark">Benchmark the LLVM compiler</a></li>
<li><a href="#statistics">Benchmark Statistics and Warning System</a></li>
<li><a href="#coverage">Improving Coverage Reports</a></li>
<li><a href="#misc_imp">Miscellaneous Improvements</a></li>
</ol></li>
<li><a href="#new">Adding new capabilities to LLVM</a>
<ol>
<li><a href="#llvm_ir">Extend the LLVM intermediate representation</a></li>
<li><a href="#pointeranalysis">Pointer and Alias Analysis</a></li>
<li><a href="#profileguided">Profile-Guided Optimization</a></li>
<li><a href="#compaction">Code Compaction</a></li>
<li><a href="#xforms">New Transformations and Analyses</a></li>
<li><a href="#codegen">Code Generator Improvements</a></li>
<li><a href="#misc_new">Miscellaneous Additions</a></li>
</ol></li>
<li><a href="#using">Project using LLVM</a>
<ol>
<li><a href="#machinemodulepass">Add a MachineModulePass</a></li>
<li><a href="#encodeanalysis">Encode Analysis Results in MachineInstr IR</a></li>
<li><a href="#codelayoutjit">Code Layout in the LLVM JIT</a></li>
<li><a href="#fieldlayout">Improved Structure Splitting and Field Reordering</a></li>
<li><a href="#slimmer">Finish the Slimmer Project</a></li>
</ol></li>
</ul>
<div class="doc_author">
<p>Written by the <a href="/">LLVM Team</a></p>
</div>
<!-- *********************************************************************** -->
<div class="www_sectiontitle">
<a name="gsoc24">Google Summer of Code 2024</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p>
Welcome prospective Google Summer of Code 2024 Students! This document is
your starting point to finding interesting and important projects for LLVM,
Clang, and other related sub-projects. This list of projects is not only
developed for Google Summer of Code, but open projects that really need
developers to work on and are very beneficial for the LLVM community.
</p>
<p>We encourage you to look through this list and see which projects excite
you and match well with your skill set. We also invite proposals not on this
list. More information and discussion about GSoC can be found in
<a href="https://discourse.llvm.org/c/community/gsoc" target="_blank">
discourse
</a>. If you have questions about a particular project please find the
relevant entry in discourse, check previous discussion and ask. If there is
no such entry or you would like to propose an idea please create a new
entry. Feedback from the community is a requirement for your proposal to be
considered and hopefully accepted.
</p>
<p>The LLVM project has participated in Google Summer of Code for several years
and has had some very successful projects. We hope that this year is no
different and look forward to hearing your proposals. For information on how
to submit a proposal, please visit the Google Summer of Code main
<a href="https://summerofcode.withgoogle.com/">website.</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="remove_ub_tests">Remove undefined behavior from tests</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
Many of LLVM's unit tests have been reduced automatically from larger tests.
Previous-generation reduction tools used undef and poison as placeholders
everywhere, as well as introduced undefined behavior (UB).
Tests with UB are not desirable because 1) they are fragile since in the
future the compiler may start optimizing more aggressively and break the
test, and 2) it breaks translation validation tools such as
<a href="https://github.com/AliveToolkit/alive2/">Alive2</a> (since it's
correct to translate a fuction that is always UB into anything).
<br />
The major steps include:
<ol>
<li>Replace known patterns such as branch on undef/poison, memory accesses
with invalid pointers, etc with non-UB patterns.</li>
<li>Use Alive2 to detect further patterns (by searching for tests that are
always UB).</li>
<li>Report any LLVM bug found by Alive2 that is exposed when removing
UB.</li>
</ol>
</p>
<p><b>Expected result:</b>
The majority of LLVM's unit tests will be free of UB.</p>
<p><b>Skills:</b>
Experience with scripting (Python or PHP) is required.
Experience with regular expressions is encouraged.
</p>
<p><b>Project size:</b> Either medium or large.</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentor:</b> <a href="https://web.ist.utl.pt/nuno.lopes/">Nuno Lopes</a></p>
<p><b>Discourse:</b>
<a href="https://discourse.llvm.org/t/gsoc-2004-remove-undefined-behavior-from-tests/77236">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="spirv_tablegen">Automatically generate TableGen file for SPIR-V instruction set</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
The existing file that describes the SPIR-V instruction set in LLVM was
manually created and is not always complete or up to date. Whenever new
instructions need to be added to the SPIR-V backend, the file must be
amended. In addition, since it is not created in a systematic way, there are
often slight discrepancies between how an instruction is described in the
SPIR-V spec and how it is declared in the TableGen file. Since SPIR-V
backend developers often use the spec as a reference when developing new
features, having a consistent mapping between the specification and TableGen
records will ease development. This project proposes creating a script
capable of generating a complete TableGen file that describes the SPIR-V
instruction set given the JSON grammar available in the
KhronosGroup/SPIRV-Headers repository, and updating SPIR-V backend code to
use the new definitions. The specific method used for translating the JSON
grammar to TableGen is left up to the discretion of the applicant, however,
it should be checked into the LLVM repository with well-documented
instructions to replicate the translation process so that future maintainers
will be able to regenerate the file when the grammar changes. Note that the
grammar itself should remain out-of-tree in its existing separate
repository.
</p>
<p><b>Expected result:</b>
<ul>
<li>The SPIR-V instruction set's definition in TableGen is replaced with
one that is autogenerated.</li>
<li>A script and documentation are written that support regenerating the
definitions as needed given the JSON grammar of the SPIR-V instruction
set.</li>
<li>Usage of the SPIR-V instruction set in the SPIR-V backend updated to
use the new autogenerated definitions.</li>
</ul>
</p>
<p><b>Skills:</b>
Experience with scripting and an intermediate knowledge of C++. Previous
experience with LLVM/TableGen is a bonus but not required.
</p>
<p><b>Project size:</b> Medium (175 hour)</p>
<p><b>Confirmed Mentors:</b>
<a href="https://github.com/sudonatalie/">Natalie Chouinard</a>,
<a href="https://github.com/keenuts/">Nathan Gauër</a></p>
<p><b>Discourse:</b>
<a href="https://discourse.llvm.org/t/clang-automatically-generate-tablegen-file-for-spir-v-instruction-set/76369">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="bitstream_cas">LLVM bitstream integration with CAS (content-addressable storage)</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
The LLVM bitstream file format is used for serialization of intermediate
compiler artifacts, such as LLVM IR or Clang modules. There are situations
where multiple bitstream files store identical information, and this
duplication leads to increased storage requirements.
<br><br>
This project aims to integrate the LLVM CAS library into the LLVM bitstream
file format. If we factor out the frequently duplicated part of a bitstream
file into a separate CAS object, we can replace all copies with a small
reference to the canonical CAS object, saving storage.
<br><br>
The primary motivating use-case for this project is the dependency scanner
that's powering "implicitly-discovered, explicitly-built" Clang modules.
There are real-world situations where even coarse de-duplication on the
block level could halve the size of the scanning module cache.
</p>
<p><b>Expected result:</b>
There's a way to configure the LLVM bitstream writer/reader to use CAS as
the backing storage.
</p>
<p><b>Skills:</b>
Intermediate knowledge of C++, some familiarity with data serialization, self-motivation.
</p>
<p><b>Project size:</b> Medium or large</p>
<p><b>Confirmed Mentors:</b>
<a href="https://github.com/jansvoboda11/">Jan Svoboda</a>,
<a href="https://github.com/cachemeifyoucan/">Steven Wu</a></p>
<p><b>Discourse:</b>
<a href="https://discourse.llvm.org/t/llvm-bitstream-integration-with-cas-content-addressable-storage/76757">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="three_way_comparison">Add 3-way comparison intrinsics</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
<a href="https://en.wikipedia.org/wiki/Three-way_comparison">3-way comparisons</a>
return the values -1, 0 or 1 depending on whether the values compare lower,
equal or greater. They are exposed in C++ via the spaceship operator
(operator&lt;=&gt;) and in Rust via the PartialOrd and Ord traits.
Currently, such comparisons produce sub-optimal codegen and optimization
results in some cases.
<br/><br/>
The goal of this project is to resolve these optimization issues by
implementing new 3-way comparison intrinsics, as described in
<a href="https://discourse.llvm.org/t/rfc-add-3-way-comparison-intrinsics/76685">[RFC] Add 3-way comparison intrinsics</a>.
The implementation steps are broadly:
<ol>
<li>Add the intrinsics to LLVM IR.</li>
<li>Implement legalization/expansion support in SelectionDAG and
GlobalISel.</li>
<li>Implement optimization support in ConstantFolding, InstSimplify,
InstCombine, CorrelatedValuePropagation, IndVarSimplify,
ConstraintElimination, IPSCCP, and other relevant transforms.</li>
<li> Make use of the intrinsics via InstCombine canonicalization or
direct emission in clang/rustc.</li>
</ol>
Adding new target-independent intrinsics is a good way of becoming familiar with a broad slice of LLVM!
</p>
<p><b>Expected result:</b>
Support for the intrinsics in the backend and the most important
optimization passes. Ideally full integration starting at the frontend.
</p>
<p><b>Skills:</b> Intermediate knowledge of C++ </p>
<p><b>Project size:</b> Medium or large</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentors:</b>
<a href="https://github.com/nikic">Nikita Popov</a>,
<a href="https://github.com/dc03">Dhruv Chawla</a></p>
<p><b>Discourse:</b>
<a href="https://discourse.llvm.org/t/llvm-add-3-way-comparison-intrinsics/76807">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="llvm_www">Improve the LLVM.org Website Look and Feel</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
The llvm.org website serves as the central hub for information about the
LLVM project, encompassing project details, current events, and relevant
resources. Over time, the website has evolved organically, prompting the
need for a redesign to enhance its modernity, structure, and ease of
maintenance.
<br/><br/>
The goal of this project is to create a contemporary and coherent static
website that reflects the essence of LLVM.org. This redesign aims to improve
navigation, taxonomy, content discoverability, mobile device support,
accessibility, and overall usability. Given
the critical role of the website in the community, efforts will be made to
engage with community members, seeking consensus on the proposed changes.
</p>
<p><b>Expected result:</b>
A modern, coherent-looking website that attracts new prospect users and
empowers the existing community with better navigation, taxonomy, content
discoverability, and overall usability. Since the website is a critical
infrastructure and most of the community will have an opinion this project
should try to engage with the community building community consensus on the
steps being taken. Suggested approach:
<ul>
<li>Conduct a comprehensive content audit of the existing website.</li>
<li>Select appropriate technologies, preferably static site generators
like Hugo or Jekyll.</li>
<li>Advocate for a separation of data and visualization, utilizing formats
such as YAML and Markdown to facilitate content management without
direct HTML coding.</li>
<li>Present three design mockups for the new website, fostering open
discussions and allowing time for alternative proposals from interested
parties.</li>
<li>Implement the chosen design, incorporating valuable feedback from the
community.</li>
<li>Collaborate with content creators to integrate or update content as
needed.</li>
</ul>
The successful candidate should commit to regular participation in weekly
meetings, deliver presentations, and contribute blog posts as requested.
Additionally, they should demonstrate the ability to navigate the community
process with patience and understanding.
</p>
<p><b>Skills:</b>
Knowledge in the area of web development with static site generators.
Knowledge in html, css, bootstrap, and markdown. Patience and self-motivation.
</p>
<p><b>Difficulty:</b> Hard</p>
<p><b>Project size:</b> Large</p>
<p><b>Confirmed Mentors:</b>
<a href=https://github.com/tlattner>Tanya Lattner</a>,
<a href=https://github.com/vgvassilev>Vassil Vassilev</a>
</p>
<p><b>Discourse:</b>
<a href="https://discourse.llvm.org/t/improve-the-llvm-org-website-look-and-feel/76864">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-repl-out-of-process">Out-of-process execution for clang-repl</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
The Clang compiler is part of the LLVM compiler infrastructure and supports
various languages such as C, C++, ObjC and ObjC++. The design of LLVM and
Clang enables them to be used as libraries, and has led to the creation of
an entire compiler-assisted ecosystem of tools. The relatively friendly
codebase of Clang and advancements in the JIT infrastructure in LLVM further
enable research into different methods for processing C++ by blurring the
boundary between compile time and runtime. Challenges include incremental
compilation and fitting compile/link time optimizations into a more dynamic
environment.
<br /> <br />
Incremental compilation pipelines process code chunk-by-chunk by building an
ever-growing translation unit. Code is then lowered into the LLVM IR and
subsequently run by the LLVM JIT. Such a pipeline allows creation of
efficient interpreters. The interpreter enables interactive exploration and
makes the C++ language more user friendly. Clang-Repl is one example.
<br /> <br />
Clang-Repl uses the Orcv2 JIT infrastructure within the same process. That
design is efficient and easy to implement however it suffers from two
significant drawbacks. First, it cannot be used in devices which do not have
sufficient resources to host the entire infrastructure, such as the arduino
due (see this
<a href="https://compiler-research.org/meetings/#caas_10Mar2022">talk</a>
for more details). Second, crashes in user codes mean that the entire
process crashes, hindering overall reliability and ease of use.
<br /> <br />
This project aims to move Clang-Repl to an out-of-process execution model
in order to address both of these issues.
</p>
<p><b>Expected result:</b>
Implement an out-of-process execution of statements with Clang-Repl;
Demonstrate that Clang-Repl can support some of the ez-clang use-cases;
Research into approaches to restart/continue the session upon crash;
As a stretch goal design a versatile reliability approach for crash recovery;
</p>
<p><b>Skills:</b>
Intermediate knowledge of C++, Understanding of LLVM and the LLVM JIT in particular
</p>
<p><b>Project size:</b>Either medium or large.</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentor:</b>
<a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clang-out-of-process-execution-for-clang-repl/68225">URL</a>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-plugins-windows">Support clang plugins on Windows</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
The Clang compiler is part of the LLVM compiler infrastructure and supports
various languages such as C, C++, ObjC and ObjC++. The design of LLVM and
Clang allows the compiler to be extended with plugins[1]. A plugin makes it
possible to run extra user defined actions during a compilation. Plugins
are supported on unix and darwin but not on windows due to some specifics of
the windows platform.
<br /> <br />
This project would expose the participant to a broad cross section of the LLVM codebase. It involves exploring the API surface, classifying the interfaces as being public or private, and annotating that information to the API declarations. It would also expose the participant to details and differences of different platforms as this work is cross-platform (Windows, Linux, Darwin, BSD, etc). The resulting changes would improve LLVM on Linux and Windows while enabling new functionality on Windows.
</p>
<p><b>Expected result:</b>
This project aims to allow make clang -fplugin=windows/plugin.dll work. The
implementation approach should extend the working prototype [3] and extend
the annotation tool [4]. The successful candidate should be prepared to
attend a weekly meeting, make presentations and prepare blog posts upon
request.
</p>
<p><i>Further reading</i><br />
[1] https://clang.llvm.org/docs/ClangPlugins.html
<br />
[2] https://discourse.llvm.org/t/clang-plugins-on-windows
<br />
[3] https://github.com/llvm/llvm-project/pull/67502
<br />
[4] https://github.com/compnerd/ids
</p>
<p><b>Skills:</b>
Intermediate knowledge of C++, Experience with Windows and its compilation
and linking model.
</p>
<p><b>Project size:</b>Either medium or large.</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentor:</b>
<a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
<a href=https://github.com/compnerd>Saleem Abdulrasool</a>
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/support-clang-plugins-on-windows/76408">URL</a>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-on-demand-parsing">On Demand Parsing in Clang</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b> Clang, like any C++ compiler, parses a
sequence of characters as they appear, linearly. The linear character
sequence is then turned into tokens and AST before lowering to machine
code. In many cases the end-user code uses a small portion of the C++
entities from the entire translation unit but the user still pays the price
for compiling all of the redundancies.
<br /> <br />
This project proposes to process the heavy compiling C++ entities upon using
them rather than eagerly. This approach is already adopted in Clang’s
CodeGen where it allows Clang to produce code only for what is being
used. On demand compilation is expected to significantly reduce the
compilation peak memory and improve the compile time for translation units
which sparsely use their contents. In addition, that would have a
significant impact on interactive C++ where header inclusion essentially
becomes a no-op and entities will be only parsed on demand.
<br /> <br />
The Cling interpreter implements a very naive but efficient
cross-translation unit lazy compilation optimization which scales across
hundreds of libraries in the field of high-energy physics.
<br /> <br />
<pre>
// A.h
#include &lt;string&gt;
#include &lt;vector&gt;
template &lt;class T, class U = int&gt; struct AStruct {
void doIt() { /*...*/ }
const char* data;
// ...
};
template&lt;class T, class U = AStruct&lt;T&gt;&gt;
inline void freeFunction() { /* ... */ }
inline void doit(unsigned N = 1) { /* ... */ }
// Main.cpp
#include &quot;A.h&quot;
int main() {
doit();
return 0;
}
</pre>
<br /> <br />
This pathological example expands to 37253 lines of code to process. Cling
builds an index (it calls it an autoloading map) where it contains only
forward declarations of these C++ entities. Their size is 3000 lines of
code. The index looks like:
<pre>
// A.h.index
namespace std{inline namespace __1{template &lt;class _Tp, class _Allocator&gt; class __attribute__((annotate(&quot;$clingAutoload$vector&quot;))) __attribute__((annotate(&quot;$clingAutoload$A.h&quot;))) __vector_base;
}}
...
template &lt;class T, class U = int&gt; struct __attribute__((annotate(&quot;$clingAutoload$A.h&quot;))) AStruct;
</pre>
<br /> <br />
Upon requiring the complete type of an entity, Cling includes the relevant
header file to get it. There are several trivial workarounds to deal with
default arguments and default template arguments as they now appear on the
forward declaration and then the definition. You can read more in [1].
<br /> <br />
Although the implementation could not be called a reference implementation,
it shows that the Parser and the Preprocessor of Clang are relatively
stateless and can be used to process character sequences which are not
linear in their nature. In particular namespace-scope definitions are
relatively easy to handle and it is not very difficult to return to
namespace-scope when we lazily parse something. For other contexts such as
local classes we will have lost some essential information such as name
lookup tables for local entities. However, these cases are probably not very
interesting as the lazy parsing granularity is probably worth doing only for
top-level entities.
<br /> <br />
Such implementation can help with already existing issues in the standard
such as CWG2335, under which the delayed portions of classes get parsed
immediately when they're first needed, if that first usage precedes the end
of the class. That should give good motivation to upstream all the
operations needed to return to an enclosing scope and parse something.
<br /> <br />
<b>Implementation approach</b>: Upon seeing a tag definition during parsing
we could create a forward declaration, record the token sequence and mark it
as a lazy definition. Later upon complete type request, we could re-position
the parser to parse the definition body. We already skip some of the
template specializations in a similar way [2, 3].
<br /> <br />
Another approach is every lazy parsed entity to record its token stream and
change the Toks stored on LateParsedDeclarations to optionally refer to a
subsequence of the externally-stored token sequence instead of storing its
own sequence (or maybe change CachedTokens so it can do that
transparently). One of the challenges would be that we currently modify the
cached tokens list to append an "eof" token, but it should be possible to
handle that in a different way.
<br /> <br />
In some cases, a class definition can affect its surrounding context in a
few ways you'll need to be careful about here:
<br /> <br />
1) `struct X` appearing inside the class can introduce the name `X` into the
enclosing context.
<br /> <br />
2) `static inline` declarations can introduce global variables with
non-constant initializers that may have arbitrary side-effects.
<br /> <br />
For point (2), there's a more general problem: parsing any expression can
trigger a template instantiation of a class template that has a static data
member with an initializer that has side-effects. Unlike the above two
cases, I don't think there's any way we can correctly detect and handle such
cases by some simple analysis of the token stream; actual semantic analysis
is required to detect such cases. But perhaps if they happen only in code
that is itself unused, it wouldn't be terrible for Clang to have a language
mode that doesn't guarantee that such instantiations actually happen.
<br /> <br />
Alternative and more efficient implementation could be to make the lookup
tables range based but we do not have even a prototype proving this could be
a feasible approach.
</p>
<p><b>Expected result:</b>
<ul>
<li>Design and implementation of on-demand compilation for non-templated functions</li>
<li>Support non-templated structs and classes</li>
<li>Run performance benchmarks on relevant codebases and prepare report</li>
<li>Prepare a community RFC document</li>
<li>[Stretch goal] Support templates</li>
</ul>
The successful candidate should commit to regular participation in weekly
meetings, deliver presentations, and contribute blog posts as
requested. Additionally, they should demonstrate the ability to navigate the
community process with patience and understanding.
</p>
<p><i>Further reading</i><br/>
[1] https://github.com/root-project/root/blob/master/README/README.CXXMODULES.md#header-parsing-in-root
<br />
[2] https://github.com/llvm/llvm-project/commit/b9fa99649bc99
<br />
[3] https://github.com/llvm/llvm-project/commit/0f192e89405ce
</p>
<p><b>Skills:</b>
Knowledge of C++, Deeper understanding of how Clang works,
knowledge of Clang AST and Preprocessor.
</p>
<p><b>Project size:</b>Large</p>
<p><b>Difficulty:</b> Hard</p>
<p><b>Confirmed Mentor:</b>
<a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
<a href=https://github.com/mizvekov>Matheus Izvekov</a>
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/on-demand-parsing-in-clang/76912">URL</a>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-doc-improve-usability">Improve Clang-Doc Usability</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
<a href=https://clang.llvm.org/extra/clang-doc.html>Clang-Doc</a> is a
C/C++ documentation generation tool created as an alternative for Doxygen
and built on top of LibTooling. This effort started in 2018 and critical
mass has landed in 2019, but the development has been largely dormant since
then, mostly due to a lack of resources.
<br /> <br />
The tool can currently generate documentation in Markdown and HTML formats,
but the tool has some structural issues, is difficult to use, the generated
documentation has usability issues and is missing several key features:
<ul>
<li>Not all C/C++ constructs are currently handled by the Markdown and
HTML emitter limiting the tool’s usability.</li>
<li>The generated HTML output does not scale with the size of the
codebase making it unusable for larger C/C++ projects.</li>
<li>The implementation does not always use the most efficient or
appropriate data structures which leads to correctness and performance
issues.</li>
<li>There is a lot of duplicated boiler plate code which could be
improved with templates and helpers.</li>
</ul>
</p>
<p><b>Expected result:</b>
The goal of this project is to address the existing shortcomings and
improve the usability of Clang-Doc to the point where it can be used to
generate documentation for large scale projects such as LLVM. The ideal
outcome is that the LLVM project will use Clang-Doc for generating its <a
href=https://llvm.org/doxygen/>reference documentation</a>.
<br /><br />
Successful proposals should focus not only on addressing the existing
limitations, but also draw inspiration for other potential improvements
from other similar tools such as <a href=https://hdoc.io/>hdoc</a>, <a
href=https://github.com/standardese/standardese>standardese</a>, <a
href=https://github.com/chromium/subspace/tree/main/subdoc>subdoc</a> or
<a href=https://cs.opensource.google/fuchsia/fuchsia/+/main:tools/cppdocgen/>cppdocgen</a>.
</p>
<p><b>Skills:</b>
Experience with web technologies (HTML, CSS, JS) and an intermediate
knowledge of C++. Previous experience with Clang/LibTooling is a bonus but
not required.
</p>
<p><b>Project size:</b> Either medium or large.</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentor:</b>
<a href=https://github.com/petrhosek>Petr Hosek</a>,
<a href=https://github.com/ilovepi>Paul Kirth</a>
</p>
<p><b>Discourse:</b> <a href=https://discourse.llvm.org/t/improve-clang-doc-usability/76996>URL</a>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="rich-disassembler-for-lldb">Rich Disassembler for LLDB</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description</b></p>
<p>Use the variable location information from the debug info to annotate LLDB’s disassembler (and `register read`) output with the location and lifetime of source variables. The rich disassembler output should be exposed as structured data and made available through LLDB’s scripting API so more tooling could be built on top of this. In a terminal, LLDB should render the annotations as text.</p>
<p><b>Expected outcomes</b></p>
For example, we could augment the disassembly for the following function
<pre>
frame #0: 0x0000000100000f80 a.out`main(argc=1, argv=0x00007ff7bfeff1d8) at demo.c:4:10 [opt]
1 void puts(const char*);
2 int main(int argc, char **argv) {
3 for (int i = 0; i < argc; ++i)
→ 4 puts(argv[i]);
5 return 0;
6 }
(lldb) disassemble
a.out`main:
...
0x100000f71 <+17>: movl %edi, %r14d
0x100000f74 <+20>: xorl %r15d, %r15d
0x100000f77 <+23>: nopw (%rax,%rax)
→ 0x100000f80 <+32>: movq (%rbx,%r15,8), %rdi
0x100000f84 <+36>: callq 0x100000f9e ; symbol stub for: puts
0x100000f89 <+41>: incq %r15
0x100000f8c <+44>: cmpq %r15, %r14
0x100000f8f <+47>: jne 0x100000f80 ; <+32> at demo.c:4:10
0x100000f91 <+49>: addq $0x8, %rsp
0x100000f95 <+53>: popq %rbx
...
</pre>
<p>using the debug information that LLDB also has access to (observe how the source variable i is in r15 from [0x100000f77+slide))</p>
<pre>
$ dwarfdump demo.dSYM --name i
demo.dSYM/Contents/Resources/DWARF/demo: file format Mach-O 64-bit x86-64
0x00000076: DW_TAG_variable
DW_AT_location (0x00000098:
[0x0000000100000f60, 0x0000000100000f77): DW_OP_consts +0, DW_OP_stack_value
[0x0000000100000f77, 0x0000000100000f91): DW_OP_reg15 R15)
DW_AT_name ("i")
DW_AT_decl_file ("/tmp/t.c")
DW_AT_decl_line (3)
DW_AT_type (0x000000b2 "int")
</pre>
to produce output like this, where we annotate when a variable is live and what its location is:
<pre>
(lldb) disassemble
a.out`main:
... ; i=0
0x100000f74 <+20>: xorl %r15d, %r15d ; i=r15
0x100000f77 <+23>: nopw (%rax,%rax) ; |
→ 0x100000f80 <+32>: movq (%rbx,%r15,8), %rdi ; |
0x100000f84 <+36>: callq 0x100000f9e ; symbol stub for: puts ; |
0x100000f89 <+41>: incq %r15 ; |
0x100000f8c <+44>: cmpq %r15, %r14 ; |
0x100000f8f <+47>: jne 0x100000f80 ; <+32> at t.c:4:10 ; |
0x100000f91 <+49>: addq $0x8, %rsp ; i=undef
0x100000f95 <+53>: popq %rbx
</pre>
<p>The goal would be to produce output like this for a subset of unambiguous cases, for example, variables that are constant or fully in registers.</p>
<p><b>Confirmed mentors and their contacts</p></b>
<ul>
<li>Adrian Prantl aprantl@apple.com (primary contact)
<li>Jonas Devlieghere jdevlieghere@apple.com
</ul>
<p><b>Required / desired skills</b></p>
<p>Required:</p>
<ul>
<li>Good understanding of C++
<li>Familiarity with using a debugger on the terminal
<li>Need to be familiar with all the concepts mentioned in the example above
<li>Need to have a good understanding of at least one assembler dialect for machine code (x86_64 or AArch64).
</ul>
<p>Desired:</p>
<ul>
<li>Compiler knowledge including data flow and control flow analysis is a plus.
<li>Being able to navigate debug information (DWARF) is a plus.
</ul>
<p><b>Size of the project.</b></p>
<p>medium (~175h)</p>
<p><b>An easy, medium or hard rating if possible</b></p>
<p>hard</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/rich-disassembler-for-lldb/76952">URL</a></p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="gpu-delta-debugging">GPU Delta Debugging</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description</b></p>
<p>
LLVM-reduce, and similar tools perform delta debugging but are less useful if many implicit constraints exist and violation could easily lead to errors similar to the cause that is to be isolated. This project is about developing a GPU-aware version, especially for execution time bugs, that can be used in conjunction with LLVM/OpenMP GPU-record-and-replay, or simply a GPU loader script, to minimize GPU test cases more efficiently and effectively.
</p>
<p><b>Expected outcomes</b></p>
<p>A tool to reduce GPU errors without loosing the original error. Optionally, other properties could be the focus of the reduction, not only errors. </p>
<p><b>Confirmed mentors and their contacts</p></b>
<ul>
<li>Parasyris, Konstantinos parasyris1@llnl.gov
<li>Johannes Doerfert jdoerfert@llnl.gov
</ul>
<p><b>Required / desired skills</b></p>
<p>Required:</p>
<ul>
<li>Good understanding of C++
<li>Familiarity with GPUs and LLVM-IR
</ul>
<p>Desired:</p>
<ul>
<li>Compiler knowledge including data flow and control flow analysis is a plus.
<li>Experience with debugging and bug reduction techniques (llvm-reduce) is helpful
</ul>
<p><b>Size of the project.</b></p>
<p>medium</p>
<p><b>An easy, medium or hard rating if possible</b></p>
<p>medium</p>
<p><b>Discourse:</b>
<a href="https://discourse.llvm.org/t/gsoc-2024-gpu-delta-debugging/77237">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="offload-libcxx">Offloading libcxx</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description</b></p>
<p>
Modern C++ defines parallel algorithms as part of the standard library, like `std::transform_reduce(std::execution::par_unseq, vec.begin(), vec.end(), 0, std::plus<int>, …)`. In this project we want to extend an implementation of those that is using OpenMP, including GPU offload, where reasonable. While some algorithms might be amenable to GPU offload via a pure (wrapper) runtime solution, we know others, especially those featuring user provided functors, will also require static program analysis and potentially transformation for additional data management. The goal of the project is to explore different algorithms and the options we have to execute them on the host as well as on accelerator devices, esp. GPUs, automatically via OpenMP.
</p>
<p><b>Expected outcomes</b></p>
<p> Improvements to the prototype support of offloading in libcxx. Evaluations against other offloading approaches and documentation on the missing parts and shortcommings. </p>
<p><b>Confirmed mentors and their contacts</p></b>
<ul>
<li>Johannes Doerfert jdoerfert@llnl.gov
<li>Tom Scogland scogland1@llnl.gov
<li>Tom Deakin tom.deakin@bristol.ac.uk
</ul>
<p><b>Required / desired skills</b></p>
<p>Required:</p>
<ul>
<li>Good understanding of C++ and C++ standard algorithms
<li>Familiarity with GPUs and (OpenMP) offloading
</ul>
<p>Desired:</p>
<ul>
<li>Experience with libcxx (development).
<li>Experience debugging and profiling GPU code.
</ul>
<p><b>Size of the project.</b></p>
<p>large</p>
<p><b>An easy, medium or hard rating if possible</b></p>
<p>medium</p>
<p><b>Discourse:</b>
<a href="https://discourse.llvm.org/t/gsoc-2024-offloading-libcxx/77238">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="parameter-tuning">The 1001 thresholds in LLVM</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description</b></p>
<p>
LLVM has lots of thresholds and flags to avoid "costly cases". However, it is unclear if these thresholds are useful, their value is reasonable, and what impact they really have. Since there are a lot, we cannot do a simple exhaustive search. In some prototype work we introduced a C++ class that can replace hardcoded values and offers control over the threshold, e.g., you can increase the recursion limit via a command line flag from the hardcoded "6" to a different number. In this project we want to explore the thresholds, when they are hit, what it means if they are hit, how we should select their values, and if we need different "profiles".
</p>
<p><b>Expected outcomes</b></p>
<p> Statistical evidence on the impact of various thresholds inside of LLVM's code base, including compile time changes, impact on transformations, and performance measurements. </p>
<p><b>Confirmed mentors and their contacts</p></b>
<ul>
<li>Jan Hueckelheim jhueckelheim@anl.gov
<li>Johannes Doerfert jdoerfert@llnl.gov
<li>William Moses wmoses@mit.edu
</ul>
<p><b>Required / desired skills</b></p>
<p>Required:</p>
<ul>
<li>Profiling skills and knowledge of statistical reasoning
</ul>
<p>Desired:</p>
<ul>
<li>Good understanding of the LLVM code base and optimization flow
</ul>
<p><b>Size of the project.</b></p>
<p>medium</p>
<p><b>An easy, medium or hard rating if possible</b></p>
<p>easy</p>
<p><b>Discourse:</b>
<a href="https://discourse.llvm.org/t/gsoc-2024-the-1001-thresholds-in-llvm/77235">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="gpu-libc">Performance tuning the GPU libc</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description</b></p>
<p>
We have begun work on a libc library targeting GPUs. This will allow users to call functions such as malloc or memcpy while executing on the GPU. However, it is important that these implementations be functional and performant. The goal of this project is to benchmark the implementations of certain libc functions on the GPU. Work would include writing benchmarks to test the current implementations as well as writing more optimal implementations.
</p>
<p><b>Expected outcomes</b></p>
<p> In-depth performance for libc functions. Overhead of GPU-to-CPU remote procedure calls. More optimal implementations of 'libc' functions. </p>
<p><b>Confirmed mentors and their contacts</p></b>
<ul>
<li>Joseph Huber joseph.huber@amd.com
<li>Johannes Doerfert jdoerfert@llnl.gov
</ul>
<p><b>Required / desired skills</b></p>
<p>Required:</p>
<ul>
<li>Profiling skills and understanding of GPU architecture
</ul>
<p>Desired:</p>
<ul>
<li>Experience with libc utilities
</ul>
<p><b>Size of the project.</b></p>
<p>small</p>
<p><b>An easy, medium or hard rating if possible</b></p>
<p>easy</p>
<p><b>Discourse:</b>
<a href="https://discourse.llvm.org/t/libc-gsoc-2024-performance-and-testing-in-the-gpu-libc/77042">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="gpu-first">Improve GPU First Framework</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description</b></p>
<p>
<a href="https://arxiv.org/abs/2306.11686">GPU First</a> is a methodology and framework that can enable any existing host code to execute the entire program on a GPU without any modification from users.
The goal of this project is two folded:
1) Port <a href="https://github.com/shiltian/llvm-project/tree/direct_gpu_compilation">host code</a> to handle RPC to the new plugin and rewrite it with the host RPC framework introduced in the GPU LibC project.
2) Explore the support for MPI among multiple thread blocks on a single GPU, or even multiple GPUs.
</p>
<p><b>Expected outcomes</b></p>
<p> More efficient GPU First framework that can support both NVIDIA and AMD GPUs. Optionally, upstream the framework. </p>
<p><b>Confirmed mentors and their contacts</p></b>
<ul>
<li>Shilei Tian i@tianshilei.me
<li>Johannes Doerfert jdoerfert@llnl.gov
<li>Joseph Huber joseph.huber@amd.com
</ul>
<p><b>Required / desired skills</b></p>
<p>Required:</p>
<ul>
<li>Good understanding of C++ and GPU architecture
<li>Familiarity with GPUs and LLVM IR
</ul>
<p>Desired:</p>
<ul>
<li>Good understanding of the LLVM code base and OpenMP target offloading
</ul>
<p><b>Size of the project.</b></p>
<p>medium</p>
<p><b>An easy, medium or hard rating if possible</b></p>
<p>medium</p>
<p><b>Discourse:</b>
<a href="https://discourse.llvm.org/t/openmp-gsoc-2024-improve-gpu-first-framework/77048">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clangir-gpu">Compile GPU kernels using ClangIR</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description:</b>
Heterogeneous programming models such as
<a href="https://sycl.tech">SYCL</a>,
<a href="https://www.openmp.org">OpenMP</a> and
<a href="https://www.openacc.org">OpenACC</a> help developers to offload
computationally intensive kernels to GPUs and other accelerators.
<a href="https://mlir.llvm.org">MLIR</a> is expected to unlock new
high-level optimisations and better code generation for the next generation
of compilers for heterogeneous programming models. However, the availability
of a robust MLIR-emitting C/C++ frontend is a prerequisite for these
efforts.
</p><p>
The <a href="https://clangir.org">ClangIR</a> (CIR) project aims to
establish a new intermediate representation (IR) for Clang. Built on top of
MLIR, it provides a dialect for C/C++ based languages in Clang, and the
necessary infrastructure to emit it from the Clang AST, as well as a
lowering path to the LLVM-IR dialect. Over the last year, ClangIR has
evolved into a mature incubator project, and a recent
<a href="https://discourse.llvm.org/t/rfc-upstreaming-clangir/76587">RFC</a>
on upstreaming it into the LLVM monorepo has seen positive comments and
community support.
</p><p>
The overall goal of this GSoC project is to identify and implement missing
features in ClangIR to make it possible to compile GPU kernels in the
<a href="https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html">OpenCL C language</a>
to LLVM-IR for the
<a href="https://registry.khronos.org/SPIR-V">SPIR-V</a> target. The OpenCL
to SPIR-V flow is a great environment for this project because a) it is
<a href="https://clang.llvm.org/docs/OpenCLSupport.html">already supported</a>
in Clang and b) OpenCL's work-item- and work-group-based programming model
still captures modern GPU architectures well. The contributor will extend
the AST visitors, the dialect and the LLVM-IR lowering, to add support e.g.
for multiple address spaces, vector and custom floating point types, and the
<code>spir_kernel</code> and <code>spir_func</code> calling conventions.
</p><p>
A good starting point for this work is the
<a href="https://github.com/sgrauerg/polybenchGpu/tree/master/OpenCL">Polybench-GPU</a>
benchmark suite. It contains self-contained small- to medium sized OpenCL
implementations of common algorithms. We expect only the device code (*.cl
files) to be compiled via ClangIR. The existing OpenCL support in Clang can
be used to create lit tests with reference LLVM-IR output to guide the
development. Optionally, the built-in result verification and time
measurements in Polybench could also be used to assess the correctness and
quality of the generated code.
</p>
<p><b>Expected result:</b>
Polybench-GPU's
<a href="https://github.com/sgrauerg/polybenchGpu/blob/master/OpenCL/2DCONV/2DConvolution.cl"><code>2DCONV</code></a>,
<a href="https://github.com/sgrauerg/polybenchGpu/blob/master/OpenCL/GEMM/gemm.cl"><code>GEMM</code></a> and
<a href="https://github.com/sgrauerg/polybenchGpu/blob/master/OpenCL/CORR/correlation.cl"><code>CORR</code></a>
OpenCL kernels can be compiled with ClangIR to LLVM-IR for SPIR-V.
</p>
<p><b>Skills:</b>
Intermediate C++ programming skills and familiarity with basic compiler
design concepts are required. Prior experience with LLVM IR, MLIR, Clang or
GPU programming is a big plus, but willingness to learn is also a
possibility.
</p>
<p><b>Project size:</b> Large</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentors:</b>
<a href="https://github.com/jopperm">Julian Oppermann</a>,
<a href="https://github.com/Naghasan">Victor Lom&uuml;ller</a>,
<a href="https://github.com/bcardosolopes">Bruno Cardoso Lopes</a>
</p>
<p><b>Discourse:</b>
<a href="https://discourse.llvm.org/t/clangir-compile-gpu-kernels-using-clangir/76984">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="half-precision-libc">Half precision in LLVM libc</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description:</b></p>
<p>
Half precision is an IEEE 754 floating point format that has been widely
used recently, especially in machine learning and AI. It has been
standardized as _Float16 in the latest C23 standard, bringing its support to
the same level as float or double data types. The goal for this project is
to implement C23 half precision math functions in the LLVM libc library.
</p>
<p><b>Expected result:</b></p>
<ul>
<li> Setup the generated headers properly so that the type and the functions
can be used with various compilers (+versions) and architectures. </li>
<li> Implement generic basic math operations supporting half precision data
types that work on supported architectures: x86_64, arm (32 + 64),
risc-v (32 + 64), and GPUs. </li>
<li> Implement specializations using compiler builtins or special hardware
instructions to improve their performance whenever possible. </li>
<li> If time permits, we can start investigating higher math functions for
half precision. </li>
</ul>
<p><b>Skills:</b></p>
<p>
Intermediate C++ programming skills and familiarity with basic compiler
design concepts are required. Prior experience with LLVM IR, MLIR, Clang or
GPU programming is a big plus, but willingness to learn is also a
possibility.
</p>
<p><b>Project size:</b> Large </p>
<p><b>Difficulty:</b> Easy/Medium</p>
<p><b>Confirmed Mentors:</b>
<a href="mailto:lntue@google.com">Tue Ly</a>,
<a href="mailto:joseph.huber@amd.com">Joseph Huber</a>,
</p>
<p><b>Discourse:</b>
<a href="https://discourse.llvm.org/t/libc-gsoc-2024-half-precision-in-llvm-libc/77027">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_sectiontitle">
<a name="gsoc23">Google Summer of Code 2023</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p>
Google Summer of Code 2023 was very successful for LLVM project. For the
list of accepted and completed projects, please take a look into Google
Summer of
Code <a href="https://summerofcode.withgoogle.com/archive/2023/organizations/llvm-compiler-infrastructure">website</a>.
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsection">
<a>LLVM</a>
</div>
<!-- *********************************************************************** -->
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="llvm_new_jitlink_reopt">Re-optimization using JITLink</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
In Just-In-Time compilers we often choose a low optimization level to
minimize compile time and improve launch times and latencies, however some
functions (which we call hot functions) are used very frequently and for
these functions it is worth optimizing more heavily. In general hot
functions can only be identified at runtime (different inputs will cause
different functions to become hot), so the aim of the reoptimization project
is to build infrastructure to (1) detect hot functions at runtime and (2)
compile them a second time at a higher optimization level, hence the name
"re-optimization".
<br /><br />
There are many possible approaches to both parts of this problem. E.g. hot
functions could be identified by sampling, or using existing profiling
infrastructure, or by implementing custom instrumentation. Reoptimization
could be applied to whole functions, or outlining could be used to enable
optimization of portions of functions. Re-entry into the JIT infrastructure
from JIT’d code might be implemented on top of existing lazy compilation, or
via a custom path.
<br /><br />
Whatever design is adopted, the goal is that the infrastructure should be
generic so that it can be used by other LLVM API clients, and should support
out-of-process JIT-compilation (so some of the solution will be implemented
in the ORC runtime).
<p><b>Expected result:</b>
<ul>
<li>Improve ergonomics of indirection – ideally all forms of indirection
(for re-optimization, lazy compilation, and procedure-linkage-tables)
should be able to share a single stub (and/or binary rewriting metadata)
at runtime.</li>
<li>Implement basic re-optimization on top of the tidied up
indirection.</li>
<li>(Stretch goal) Garbage-collect unoptimized code that is no longer
needed once the optimized version is available.</li>
</ul>
<p><b>Desirable skills:</b>
Intermediate C++; Understanding of LLVM and the LLVM JIT in particular.
<p><b>Project size:</b> Large.</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentor:</b>
<a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
<a href=https://github.com/lhames>Lang Hames</a></p>
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/re-optimization-using-jitlink/68260">URL</a>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="llvm_new_jitlink_backends">JITLink new backends</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
JITLink is LLVM's new JIT linker API -- the low-level API that transforms
compiler output (relocatable object files) into ready-to-execute bytes in
memory. To do this JITLink’s generic linker algorithm needs to be
specialized to support the target object format (COFF, ELF, MachO), and
architecture (arm, arm64, i386, x86-64). LLVM already has mature
implementations of JITLink for MachO/arm64, MachO/x86-64, ELF/x86-64,
ELF/aarch64 and COFF/x86-64, while the implementations for ELF/riscv,
ELF/aarch32 and COFF/i386 are still relatively new.
<br />
You can either work on an entirely new architecture like PowerPC or eBPF,
or complete one of the recently added JITLink implementations. In both cases
you will likely reuse the existing generic code for one of the target object
formats. You will also work on relocation resolution, populate PLTs and GOTs
and wire up the ORC runtime for your chosen target.
<br />
<p><b>Expected result:</b>
Write a JITLink specialization for a not-yet-supported or incomplete
format/architecture such as PowerPC, AArch32 or eBPF.
<p><b>Desirable skills:</b>
Intermediate C++; Understanding of LLVM and the LLVM JIT in particular;
familiarity with your chosen format/architecture, and basic linker concepts
(e.g. sections, symbols, and relocations).
<p><b>Project size:</b> Large.</p>
<p><b>Difficulty:</b>Medium</p>
<p><b>Confirmed Mentor:</b>
<a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
<a href=https://github.com/lhames>Lang Hames</a></p>
<a href=https://github.com/weliveindetail>Stefan Gränitz</a></p>
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/jitlink-new-backends/68223">URL</a>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="llvm_improving_compile_times">Improving compile times</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
While the primary job of a compiler is to produce fast code (good run-time
performance), it is also important that optimization doesn’t take too much
time (good compile-time performance). The goal of this project is to improve
compile-time without hurting optimization quality.
<br />
The general approach to this project is:
<ol>
<li>Pick a workload to optimize. For example, this could be a file from
<a href="https://github.com/llvm/llvm-test-suite/tree/main/CTMark">CTMark</a>
compiled in a certain build configuration (e.g. <code>-O0 -g</code> or
<code>-O3 -flto=thin</code>).</li>
<li>Collect profiling information. This could involve compiler options like
<code>-ftime-report</code> or <code>-ftime-trace</code> for a high-level
overview, as well as <code>perf record</code> or
<code>valgrind --tool=callgrind</code> for a detailed profile.</li>
<li>Identify places that are unexpectedly slow. This is heavily workload
dependent.</li>
<li>Try to optimize an identified hotspot, ideally without impacting generated
code. The <a href="https://llvm-compile-time-tracker.com/">compile-time tracker</a>
can be used to quickly evaluate impact on CTMark.</li>
</ol>
As a disclaimer, it should be noted that outside of pathological cases,
compilation doesn’t tend to have a convenient hotspot where 90% of the time
is spent, instead it is spread out across many passes. As such, individual
improvements also tend to have only small impact on overall compile-time.
Expect to do 10 improvements of 0.2% each, rather than one improvement of 2%.
</p>
<p><b>Expected result:</b>
Substantial improvements on some individual files (multiple percent), and a
small improvement on overall geomean compile-time.</p>
<p><b>Desirable skills:</b>
Intermediate C++. Familiarity with profiling tools (especially if you are
not on Linux, in which case I won’t be able to help).</p>
<p><b>Project size:</b> Either medium or large.</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentor:</b> <a href="https://github.com/nikic">Nikita Popov</a></p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/llvm-improving-compile-times/68094">URL</a>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="llvm_addressing_rust_optimization_failures">Addressing Rust optimization failures</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
The <a href="https://www.rust-lang.org/">Rust programming language</a> uses
LLVM for code generation, and heavily relies on LLVM’s optimization
capabilities. However, there are many cases where LLVM fails to optimize
typical code patterns that are emitted by rustc. Such issues are reported
using the <a href="https://github.com/rust-lang/rust/issues?q=is%3Aopen+is%3Aissue+label%3AI-slow">I-slow</a>
and/or <a href="https://github.com/rust-lang/rust/issues?q=is%3Aopen+is%3Aissue+label%3AA-LLVM">A-LLVM</a> labels.
<br />
The usual approach to fixing these issues is:
<ol>
<li>Inspect the <code>--emit=llvm-ir</code> output on
<a href="https://rust.godbolt.org/">Godbolt</a>.</li></li>
<li>Create an LLVM IR test case that is not optimized when run through
<code>opt -O3</code>.</li>
<li>Identify a minimal missing transform and prove its correctness
using <a href="https://alive2.llvm.org/ce/">alive2</a>.</li>
<li>Identify which LLVM pass or passes could perform the transform.</li>
<li>Add necessary test coverage and implement the transform.</li>
<li>(Much later: Check that the issue is really resolved after the next
major LLVM version upgrade in Rust.)</li>
</ol>
The goal of this project is to address some of the less hard optimization
failures. This means that in some cases, the process would stop after step 3
or 4 without proceeding to implementation, because it’s unclear how the issue
could be addressed, or it would take a large amount of effort. Having an
analysis of the problem is still valuable in that case.
</p>
<p><b>Expected result:</b>
Fixes for a number of easy to medium Rust optimization failures. Preliminary
analysis for some failures even if no fix was implemented.</p>
<p><b>Desirable skills:</b>
Intermediate C++ for implementation. Some familiarity with LLVM (at least
ability to understand LLVM IR) for analysis. Basic Rust knowledge (enough
to read, but not write Rust).</p>
<p><b>Project size:</b> Either medium or large.</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentor:</b> <a href="https://github.com/nikic">Nikita Popov</a></p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/llvm-addressing-rust-optimization-failures-in-llvm/68096">URL</a>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-repl-autocompletion">Implement autocompletion in clang-repl</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
The Clang compiler is part of the LLVM compiler infrastructure and supports
various languages such as C, C++, ObjC and ObjC++. The design of LLVM and
Clang enables them to be used as libraries, and has led to the creation of
an entire compiler-assisted ecosystem of tools. The relatively friendly
codebase of Clang and advancements in the JIT infrastructure in LLVM further
enable research into different methods for processing C++ by blurring the
boundary between compile time and runtime. Challenges include incremental
compilation and fitting compile/link time optimizations into a more dynamic
environment.
<br /> <br />
Incremental compilation pipelines process code chunk-by-chunk by building an
ever-growing translation unit. Code is then lowered into the LLVM IR and
subsequently run by the LLVM JIT. Such a pipeline allows creation of
efficient interpreters. The interpreter enables interactive exploration and
makes the C++ language more user friendly. The incremental compilation mode
is used by the interactive C++ interpreter, Cling, initially developed to
enable interactive high-energy physics analysis in a C++ environment.
<br /> <br />
<a href="https://compiler-research.org/">Our group</a> puts efforts to
incorporate and possibly redesign parts of Cling in Clang mainline through a
new tool, clang-repl. The project aims at the design and implementation of
robust autocompletion when users type C++ at the prompt of clang-repl.
For example:
<pre>
[clang-repl] class MyLongClassName {};
[clang-repl] My&lt;tab&gt;
// list of suggestions.
</pre>
</p>
<p><b>Expected result:</b>
There are several foreseen tasks:
<ul>
<li>Research the current approaches for autocompletion in clang such as
clang -code-completion-at=file:col1:col2.</li>
<li>Implement a version of the autocompletion support using the partial
translation unit infrastructure in clang’s libInterpreter.</li>
<li>Investigate the requirements for semantic autocompletion which takes
into account the exact grammar position and semantics of the code. Eg:
<pre>
[clang-repl] struct S {S* operator+(S&) { return nullptr;}};
[clang-repl] S a, b;
[clang-repl] v = a + &lt;tab&gt; // shows b as the only acceptable choice here.
</pre>
</li>
<li>Present the work at the relevant meetings and conferences.</li>
</p>
<p><b>Project size:</b>Large.</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentor:</b>
<a href=https://github.com/vgvassilev>Vassil Vassilev</a>
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clang-repl-implement-autocompletion-in-clang-repl/60364">URL</a>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-modules-build-daemon">Modules build daemon: build system agnostic support for explicitly built modules</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b> Clang currently handles modules independently in each
<code>clang</code> instance using the filesystem for synchronization of which instance builds
a given module. This has many issues with soundness and performance due to tradeoffs made for
module reuse and filesystem contention.</p>
<p>Clang has another way of building modules, explicitly built modules, that currently requires
build system changes to adopt. Here the build system determines which modules are needed, for
example by using <a href="https://github.com/llvm/llvm-project/tree/main/clang/tools/clang-scan-deps">clang-scan-deps</a>,
and ensures those modules are built before running the <code>clang</code> compile task that
needs them.</p>
<p>In order to allow adoption of this new way of building modules without major build system work
we need a module build daemon. With a small change to the command line, clang will connect to
this daemon and ask for the modules it needs. The module build daemon then either returns an
existing valid module, or builds and then returns it.</p>
<p>There is an existing open source dependency scanning daemon that is in a llvm-project fork.
This only handles file dependencies, but has an IPC mechanism. This IPC system could be used as
a base for the modules build daemon, but does need to be extended to work on Windows.</p>
<p><b>Expected result:</b> A normal project using Clang modules with an existing build system
(like Make or CMake) can be built using only explicitly built modules via a modules build
daemon.</p>
<p><b>Desirable skills:</b> Intermediate C++ programming skills; familiarity with compilers;
familiarity with Clang is an asset, but not required.</p>
<p><b>Project size:</b> 175h or 350h depending on reuse of IPC</p>
<p><b>Difficulty:</b> medium</p>
<p><b>Confirmed Mentors:</b>
<a href="https://github.com/Bigcheese">Michael Spencer</a>,
<a href="https://github.com/jansvoboda11">Jan Svoboda</a>
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clang-modules-build-daemon-build-system-agnostic-support-for-explicitly-built-modules/68224">URL</a></p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-extract-api-categories">ExtractAPI Objective-C categories</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b> <a href="https://github.com/apple/swift-docc">Swift-DocC</a> is
the canonical documentation compiler for the Swift OSS project. However
Swift-DocC is not Swift specific and
uses <a href="https://github.com/apple/swift-docc-symbolkit/blob/main/openapi.yaml">SymbolKit</a>'s
languaguage agnostic JSON-based symbol graph format to understand which
symbols are available in the code, this way any language can be supported by
Swift-DocC as long as there is a symbol graph generator.</p>
<p>Clang supports symbol graph generation for C and Objective-C as described
in <a href="https://discourse.llvm.org/t/rfc-clang-support-for-api-information-generation-in-json/58845">[RFC]
clang support for API information generation in JSON</a>. Today, support for
Objective-C categories is not complete, on one hand if the category extends a
type in the current module, the category members are assumed to belong to the
extended type itself. On the other hand, if the extended type belongs to
another module the category is ignored. Nonetheless, it is common to extend
types belonging to other modules in Objective-C as part of the public API of
the module. The goal of this project is to extend the symbol graph format to
accommodate Objective-C categories and to implement support for generating
this information both through clang and through libclang.</p>
<p><b>Expected result:</b> Adding the necessary support to clang's symbol graph
generator and in libclang for describing categories of symbols defined in
other modules. This might involve additions to SymbolKit that would need to be
discussed with that community.</p>
<p><b>Desirable skills:</b> Intermediate C++ programming skills; familiarity
with clang and Objective-C are assets but not required.</p>
<p><b>Project size:</b> Medium</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentors:</b>
<a href="https://github.com/daniel-grumberg">Daniel Grumberg</a>,
<a href="https://github.com/zixu-w">Zixu Wang</a>,
<a href="https://github.com/ributzka">Juergen Ributzka</a>
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clang-extractapi-objective-c-categories/68370">URL</a></p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-extract-api-cpp-support">ExtractAPI C++ Support</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b> <a href="https://github.com/apple/swift-docc">Swift-DocC</a> is
the canonical documentation compiler for the Swift OSS project. However
Swift-DocC is not Swift specific and
uses <a href="https://github.com/apple/swift-docc-symbolkit/blob/main/openapi.yaml">SymbolKit</a>'s
languaguage agnostic JSON-based symbol graph format to understand which
symbols are available in the code, this way any language can be supported by
Swift-DocC as long as there is a symbol graph generator.</p>
<p>Clang supports symbol graph generation for C and Objective-C as described
in <a href="https://discourse.llvm.org/t/rfc-clang-support-for-api-information-generation-in-json/58845">[RFC]
clang support for API information generation in JSON</a>.</p>
<p>Currently the emitted symbol graph format does not support various C++
constructs such as templates and exceptions and the symbol graph generator
does not fully understand C++. This project aims to introduce support for
various C++ constructs in the symbol graph format and to implement support for
generating this data in clang.</p>
<p><b>Expected result:</b> Adding the necessary support to clang's symbol graph
generator and in libclang for describing categories of symbols defined in
other modules. This will involve additions to SymbolKit that would need to be
discussed with that community.</p>
<p><b>Desirable skills:</b> Intermediate C++ programming skills; familiarity
with clang and Objective-C are assets but not required.</p>
<p><b>Project size:</b> Large</p>
<p><b>Difficulty:</b> Medium/Hard</p>
<p><b>Confirmed Mentors:</b>
<a href="https://github.com/daniel-grumberg">Daniel Grumberg</a>,
<a href="https://github.com/zixu-w">Zixu Wang</a>,
<a href="https://github.com/ributzka">Juergen Ributzka</a>
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/extractapi-c-support/68371">URL</a></p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-extract-api-while-building">ExtractAPI while building</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b> <a href="https://github.com/apple/swift-docc">Swift-DocC</a> is
the canonical documentation compiler for the Swift OSS project. However
Swift-DocC is not Swift specific and
uses <a href="https://github.com/apple/swift-docc-symbolkit/blob/main/openapi.yaml">SymbolKit</a>'s
languaguage agnostic JSON-based symbol graph format to understand which
symbols are available in the code, this way any language can be supported by
Swift-DocC as long as there is a symbol graph generator.</p>
<p>Clang supports symbol graph generation for C and Objective-C as described
in <a href="https://discourse.llvm.org/t/rfc-clang-support-for-api-information-generation-in-json/58845">[RFC]
clang support for API information generation in JSON</a>.</p>
<p>Currently users can use clang to generate symbol graph files using
the <code>clang -extract-api</code> command line interface or generating
symbol graphs for a specific symbol using the libclang interface. This project
would entail adding a third mode that would generate the symbol graph output
as a side-effect of a regular compilation job. This can enable using the
symbol graph format as a light weight alternative to clang Index or clangd
for code intelligence services.</p>
<p><b>Expected result:</b> Enable generating symbol graph files during a
regular compilation (or module build); provide a tool to merge symbol graph
files in the same way a static linker links individual object files; Extend
clang Index to support all the information contained by symbol graph
files.</p>
<p><b>Desirable skills:</b> Intermediate C++ programming skills; familiarity
with clang and Objective-C are assets but not required.</p>
<p><b>Project size:</b> Medium</p>
<p><b>Difficulty:</b> Medium/Hard</p>
<p><b>Confirmed Mentors:</b>
<a href="https://github.com/daniel-grumberg">Daniel Grumberg</a>,
<a href="https://github.com/zixu-w">Zixu Wang</a>,
<a href="https://github.com/ributzka">Juergen Ributzka</a>
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clang-extractapi-while-building/68372">URL</a></p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-improve-diagnostics2">Improve Clang diagnostics</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description:</b>
The diagnostics clang emits are ultimately its interface to the developer. While the diagnostics are generally good, there are some rough edges that need to be ironed out. Some cases can be improved by special-casing them in the compiler as well.
</p>
<p>
As one can see from Clang’s issue tracker, there are <a href="https://github.com/llvm/llvm-project/issues?page=2&q=is%3Aopen+is%3Aissue+label%3Aclang%3Adiagnostics">lots of issues</a> open against clang’s diagnostics.
</p>
<p>
This project does not aim to implement one big feature but instead focuses on smaller, incremental improvements to Clang’s diagnostics.
</p>
<p>
Possible example issues to resolve:
<ul>
<li><a href="https://github.com/llvm/llvm-project/issues/59872">Calling nullptr function pointer in a constexpr function results in poor diagnostic</a></li>
<li><a href="https://github.com/llvm/llvm-project/issues/58601">Print name of uninitialized subobject (instead of type)</a></li>
<li><a href="https://github.com/llvm/llvm-project/issues/57906">https://github.com/llvm/llvm-project/issues/57906</a></li>
<li><a href="https://github.com/llvm/llvm-project/issues/57337">clang(++) unhelpful frame-larger-than warning, very small stack frame exceeding very large limit</a></li>
<li>Any other diagnostics issue you find interesting or ran into personally.</li>
</ul>
</p>
<p><b>Expected outcomes</b>:
At least three fixed smaller diagnostics issues, or one larger implemented diagnostics improvement.
</p>
<p><b>Confirmed Mentor:</b><a href=https://github.com/tbaederr>Timm Bäder</a>
<p><b>Desirable skills:</b>
<ul>
<li>Intermediate C++ knowledge.</li>
<li>Preferably experience in the Clang code base, since the issues mentioned can have their root cause in various parts of it.</li>
<li>Preferably an already working local LLVM build</li>
</ul>
</p>
<p><b>Project type:</b> Medium/200 hr</p>
<p><b>Discourse</b>
<a href="https://discourse.llvm.org/t/improve-clang-diagnostics-2/68900/3">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-tutorials-clang-repl">Tutorial development with clang-repl</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description:</b>
The Clang compiler is part of the LLVM compiler infrastructure and supports
various languages such as C, C++, ObjC and ObjC++. The design of LLVM and
Clang enables them to be used as libraries, and has led to the creation of
an entire compiler-assisted ecosystem of tools. The relatively friendly
codebase of Clang and advancements in the JIT infrastructure in LLVM further
enable research into different methods for processing C++ by blurring the
boundary between compile time and runtime. Challenges include incremental
compilation and fitting compile/link time optimizations into a more dynamic
environment.
</p>
<p>
Incremental compilation pipelines process code chunk-by-chunk by building an
ever-growing translation unit. Code is then lowered into the LLVM IR and
subsequently run by the LLVM JIT. Such a pipeline allows creation of
efficient interpreters. The interpreter enables interactive exploration and
makes the C++ language more user friendly. The incremental compilation mode
is used by the interactive C++ interpreter, Cling, initially developed to
enable interactive high-energy physics analysis in a C++ environment.
</p>
<p>
We invest efforts to incorporate and possibly redesign parts of Cling in
Clang mainline through a new tool, clang-repl. The project aims implementing
tutorials demonstrating the capabilities of the project and investigating
adoption of clang-repl in xeus-clang-repl prototype allowing to write C++
in Jupyter.
</p>
<p><b>Expected result:</b>
There are several foreseen tasks:
<ul>
<li>Write several tutorials demostrating the current capabilities of
clang-repl.</li>
<li>Investigate the requirements for adding clang-repl as a backend to
xeus-cling.</li>
<li>Improve the xeus kernel protocol for clang-repl.</li>
<li>Prepare a blog post about clang-repl and possibly Jupyter.
Present the work at the relevant meetings and conferences.</li>
</p>
<p><b>Confirmed Mentor:</b>
<a href="https://github.com/vgvassilev">Vassil Vassilev</a>
<a href="https://github.com/davidlange6">David Lange</a>
<p><b>Desirable skills:</b>
Intermediate C++; Understanding of Clang and the Clang API in particular
</p>
<p><b>Project type:</b> Medium</p>
<p><b>Discourse</b>
<a href="https://discourse.llvm.org/t/clang-repl-tutorial-development-with-clang-repl/60365">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-repl-wasm">Add WebAssembly Support in clang-repl</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description:</b>
The Clang compiler is part of the LLVM compiler infrastructure and supports
various languages such as C, C++, ObjC and ObjC++. The design of LLVM and
Clang enables them to be used as libraries, and has led to the creation of
an entire compiler-assisted ecosystem of tools. The relatively friendly
codebase of Clang and advancements in the JIT infrastructure in LLVM further
enable research into different methods for processing C++ by blurring the
boundary between compile time and runtime. Challenges include incremental
compilation and fitting compile/link time optimizations into a more dynamic
environment.
</p>
<p>
Incremental compilation pipelines process code chunk-by-chunk by building an
ever-growing translation unit. Code is then lowered into the LLVM IR and
subsequently run by the LLVM JIT. Such a pipeline allows creation of
efficient interpreters. The interpreter enables interactive exploration and
makes the C++ language more user friendly. The incremental compilation mode
is used by the interactive C++ in Jupyter via the xeus kernel protocol.
Newer versions of the protocol allow possible in-browser execution allowing
further possibilities for clang-repl and Jupyter.
</p>
<p>
We invest efforts to incorporate and possibly redesign parts of Cling in
Clang mainline through a new tool, clang-repl. The project aims to add
WebAssembly support in clang-repl and adopt it in xeus-clang-repl to aid
Jupyter-based C++.
</p>
<p><b>Expected result:</b>
There are several foreseen tasks:
<ul>
<li>Investigate feasibility of generating WebAssembly in a similar way to
the new <a href="https://reviews.llvm.org/D146389">interactive CUDA support</a>.</li>
<li>Enable generating WebAssembly in clang-repl.</li>
<li>Adopt the feature in xeus-clang-repl.</li>
<li>Prepare a blog post about clang-repl and possibly Jupyter.
Present the work at the relevant meetings and conferences.</li>
</p>
<p><b>Confirmed Mentor:</b>
<a href="https://github.com/vgvassilev">Vassil Vassilev</a>
<a href="https://github.com/alexander-penev">Alexander Penev</a>
<p><b>Desirable skills:</b>
Good C++; Understanding of Clang and the Clang API and the LLVM JIT in particular
</p>
<p><b>Project type:</b> Large</p>
<p><b>Discourse</b>
<a href="https://discourse.llvm.org/t/clang-repl-add-webassembly-support-in-clang-repl/69419">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_lld_embedded">LLD Linker Improvements for Embedded Targets</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project</b>
GNU toolchain is used widely for building embedded targets. There's a certain momentum in the
Clang/LLVM community towards improving the Clang toolchain to support embedded targets. Using
the Clang toolchain as an alternative can help us improve code quality, find and fix security
bugs, improve developer experience and take advantage of the new ideas and the momentum
surrounding the Clang/LLVM community in supporting embedded devices.
</p>
<p><b>A non-comprehensive list of improvements that can be made to LLD</b>:
<ul>
<li>
<p><b>--print-memory-usage support</b></p>
<p>"--print-memory-usage" in GCC provides a breakdown of the memory used in each memory region
defined in the linker file. Embedded developers use this flag to understand the impact on
memory. Often embedded systems define multiple memory regions with different space
constraints. Supporting this in Clang toolchain will help projects that wish to use Clang
toolchain for their projects.</p>
</li>
<li>
<p><b>Linkmap</b></p>
<p>Currently, the LLD linker's linkmap output is not as rich as the BFD linker output.
Achieving feature parity on linkmap output will be highly
useful in analyzing the binaries created by the LLD linker. Further, outputting linkmap in
different formats (current LLD output, BFD, and JSON) can help build automation tools for
investigating the artifacts produced by the linker.</p>
</li>
<li>
<p><b>--print-gc-sections improvement</b></p>
<p>When the "--print-gc-sections" flag is enabled, LLD prints the sections that were
discarded during the linking process. This information currently does not include the
mapping between the symbol and the section groups, which is useful for debugging.
Preserving this information during the linking process will require modifications to
internal linker data structures.</p>
</li>
</ul>
<p><b>Project size:</b> Medium or Large</p>
<p><b>Difficulty:</b> Medium/Hard</p>
<p><b>Skills:</b> C++</p>
<p><b>Expected result</b>:
<ul>
<li>Implementation of "--print-memory-usage" flag.</li>
<li>Support for new linkmap output formats 1. BFD and 2. JSON. </li>
<li>Improved "--print-gc-sections" output to include information about the surviving symbols.</li>
</ul>
</p>
<p><b>Confirmed Mentors:</b>
<a href="https://github.com/Prabhuk">Prabhu Rajasekaran</a>
<a href="https://github.com/petrhosek">Petr Hosek</a>
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/lld-linker-improvements-for-embedded/68129">URL</a>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_mlir_presburger_opt">Optimizing MLIR’s Presburger library </a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><strong>Description</strong>: MLIR’s Presburger Library, FPL (<a href="https://grosser.science/FPL">https://grosser.science/FPL</a>), provides mathematical abstractions for polyhedral compilation and analysis. The main abstraction that the library provides is a set of integer tuples defined by a system of affine inequality constraints. The library supports standard set operations over such sets. The result will be a set defined by another constraint system, possibly having more constraints. When many set operations are performed in sequence, the constraint system may become very large, negatively impacting performance. There are several potential ways to simplify the constraint system; however, this involves performing additional computations. Thus, spending more time on more aggressive simplifications may make each individual operation slower, but at the same time, insufficient simplifications can make sequences of operations slow due to an explosion in constraint system size. The aim of this project is to find the right balance between the two.</p>
<p><strong>The goals of this project:</strong></p>
<ul>
<li>Understand the library&#39;s performance in terms of runtime and output size.</li>
<li>Optimize the library by finding the best output size and performance tradeoff.</li>
</ul>
<p><strong>Expected outcomes</strong>:</p>
<ul>
<li>Benchmarking the performance and output constraint complexity of the primary operations of the library.</li>
<li>Implementing simplification heuristics.</li>
<li>A better understanding of which simplification heuristics improve overall performance enough to be worth the additional computational cost.</li>
</ul>
<p><strong>Desirable skills</strong>: Intermediate C++, Experience in benchmarking</p>
<p><strong>Project size</strong>: Large</p>
<p><strong>Difficulty</strong>: Medium</p>
<p><strong>Confirmed mentors</strong>: <a href="https://github.com/Groverkss">Kunwar Grover</a></p>
<p><strong>Discourse</strong>: <a href="https://discourse.llvm.org/t/mlir-optimizing-mlir-s-presburger-library/68213/1">URL</a></p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_mlir_query">Interactively query MLIR IR</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><strong>Description</strong>:
The project aims to develop an interactive query language for MLIR that enables developers to query the MLIR IR dynamically.
The tool will provide a REPL (or command-line) interface to enable users to query various properties of MLIR code, such as
"isConstant" and "resultOf". The proposed tool is intended to be similar to clang-query, which allows developers to match
AST expressions in C++ code using a TUI with autocomplete and other features.
</p>
<p><strong>The goals of this project:</strong></p>
<ul>
<li>Understand the MLIR IR representation and common explorations user do.</li>
<li>Implement a REPL to execute queries over MLIR IR.</li>
</ul>
<p><strong>Expected outcomes</strong>:</p>
<ul>
<li>Standalone that can be used to interactively explore IR.</li>
<li>Implement common matchers that are usable by the tool.</li>
<li>(stretch) Enable extracting parts of the IR matched by query into self-contained IR snippets.</li>
</ul>
<p><strong>Desirable skills</strong>: Intermediate C++, Experience in writing/debugging peephole optimizations</p>
<p><strong>Project size</strong>: Either medium or large.</p>
<p><strong>Difficulty</strong>: Medium</p>
<p><strong>Confirmed mentors</strong>: <a href="https://github.com/jpienaar">Jacques Pienaar</a></p>
<p><strong>Discourse</strong>: <a href="https://discourse.llvm.org/t/gsoc-proposal-interactive-mlir-query-tool-to-make-exploring-the-ir-easier/69601">URL</a></p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_mlgo_latency_model">Better performance models for MLGO training</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project</b>
We are using machine-guided compiler optimizations ("MLGO") for register allocation eviction and inlining for size, in
real-life deployments. The ML models have been trained with reinforcement learning algorithms. Expanding to more
performance areas is currently impeded by the poor prediction quality of our performance estimation models. Improving
those is critical to the effectiveness of reinforcement learning training algorithms, and therefore to enabling applying
MLGO systematically to more optimizations.
</p>
<p><b>Project size:</b> either 175 or 350 hr.</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Skills:</b> C/C++, some compiler experience, some Python. ML experience is a bonus.</p>
<p><b>Expected outcomes</b>: Better modeling of the execution environment by including additional runtime/profiling
information, such as additional PMU data, LLC miss probabilities or branch mispredictions. This involves (1) building
a data collection pipeline that covers additional runtime information, (2) modifying the ML models to allow processing
this data, and (3) modifying the training and inference process for the models to make use this data.
<p>Today, the models are almost pure static analysis; they see the instructions, but they make one-size-fits-all
assumptions about the execution environment and the runtime behavior of the code. The goal of this project is to move
from static analysis towards more dynamic models that better represent code the way it actually executes.</p>
<p><b>Mentors</b>
Ondrej Sykora, Mircea Trofin, Aiden Grossman
</p>
<p>
<b>Discourse</b>
<a href="https://discourse.llvm.org/t/better-performance-models-for-mlgo-training/68219">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang_analyzer_taint_analysis">Improve and Stabilize the Clang Static Analyzer's "Taint Analysis" Checks</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
The Clang static analyzer comes with an experimental implementation of
taint analysis, a security-oriented analysis technique built to warn
the user about flow of attacker-controlled ("tainted") data into
sensitive functions that may behave in unexpected and dangerous ways
if the attacker is able to forge the right input. The programmer can address
such warnings by properly "sanitizing" the tainted data in order to
eliminate these dangerous inputs. A common example of a problem that can be
caught this way is <a href="https://xkcd.com/327/">SQL injections</a>.
A much simpler example, which is arguably much more relevant to users
of Clang, is buffer overflow vulnerabilities caused by attacker-controlled
numbers used as loop bounds while iterating over stack or heap arrays, or
passed as arguments to low-level buffer manipulating functions such as
<tt>memcpy()</tt>.
</p>
<p>
Being a static symbolic execution engine, the static analyzer implements
taint analysis by simply maintaining a list of "symbols" (named unknown
numeric values) that were obtained from known taint sources during the
symbolic simulation. Such symbols are then treated as potentially taking
arbitrary concrete values, as opposed to the general case of taking an
unknown subset of possible values. For example, division by a unchecked
unknown value doesn't necessarily warrant a division by zero warning,
because it's typically not known whether the value can be zero or not.
However, division by an unchecked <i>tainted</i> value does immediately
warrant a division by zero warning, because the attacker is free
to pass zero as an input. Therefore the static analyzer's taint
infrastructure consists of several parts: there is a mechanism for keeping
track of tainted symbols in the symbolic program state, there is a way to
define new sources of taint, and a few path-sensitive checks were taught to
consume taint information to emit additional warnings (like the division
by zero checker), acting as taint "sinks" and defining checker-specific
"sanitization" conditions.
</p>
<p>
The entire facility is flagged as experimental: it's basically a
proof-of-concept implementation. It's likely that it can be made to work
really well, but it needs to go through some quality control by running it
on real-world source code, and a number of bugs need to be addressed,
especially in individual checks, before we can declare it stable.
Additionally, the tastiest check of them all – buffer overflow detection
based on tainted loop bounds or size parameters – was never implemented.
There is also a related check for array access with tainted index – which
is, again, experimental; let's see if we can declare this one stable
as well!
</p>
<p><b>Expected result:</b>
A number of taint-related checks either enabled by default for all users
of the static analyzer, or available as opt-in for users who care about
security. They're confirmed to have low false positive rate on real-world
code. Hopefully, the buffer overflow check is one of them.</p>
<p><b>Desirable skills:</b>
Intermediate C++ to be able to understand LLVM code. We'll run our analysis
on some plain C code as well. Some background in compilers or security is
welcome but not strictly necessary.
</p>
<p><b>Project size:</b> Either medium or large.</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentors:</b>
<a href="https://github.com/haoNoQ">Artem Dergachev</a>,
<a href="https://github.com/xazax-hun">Gábor Horváth</a>,
<a href="https://github.com/ziqingluo-90">Ziqing Luo</a>
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clang-improve-and-stabilize-the-static-analyzers-taint-analysis-checks/68235">URL</a>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_mlgo_passes_2023">Machine Learning Guided Ordering of Compiler Optimization Passes</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project</b>
This continues the work of GSoC 2020 and <a href="https://summerofcode.withgoogle.com/archive/2021/projects/6411038932598784">2021</a>.
Developers generally use standard optimization pipelines like -O2 and -O3 to
optimize their code. Manually crafted heuristics are used to determine which
optimization passes to select and how to order the execution of those passes.
However, this process is not tailored for a particular program, or kind
of program, as it is designed to perform “reasonably well” for any input.
We want to improve the existing heuristics or replace the heuristics with
machine learning-based models so that the LLVM compiler can provide a superior
order of the passes customized per program.
The last milestone enabled feature extraction, and started investigating training
a policy for selecting a more appropriate pass pipeline.
</p>
<p><b>Project size:</b> either 175 or 350 hr.</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Skills:</b> C/C++, some compiler experience. ML experience is a bonus.</p>
<p><b>Expected outcomes</b>: Pre-trained model selecting the most economical
optimization pipeline, with no loss in performance; hook-up of model in LLVM;
(re-)training tool; come up with new optimization sequences through search or learning.</p>
<p><b>Mentors</b>
Tarindu Jayatilaka, Mircea Trofin, Johannes Doerfert
</p>
<p>
<b>Discourse</b>
<a href="https://discourse.llvm.org/t/machine-learning-guided-ordering-of-compiler-optimization-passes/60415">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_code_coverage">Support a hierarchical directory structure in generated coverage html reports</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b><br>
Clang supports source-based coverage that shows which lines of code are covered by the executed tests
<a href="https://clang.llvm.org/docs/SourceBasedCodeCoverage.html">[1]</a>.
It uses llvm-profdata <a href="https://llvm.org/docs/CommandGuide/llvm-profdata.html">[2]</a> and
llvm-cov <a href="https://llvm.org/docs/CommandGuide/llvm-cov.html">[3]</a> tools to generate coverage reports.
llvm-cov currently generates a single top-level index HTML file.
For example, a single top-level directory code coverage report
<a href="https://lab.llvm.org/coverage/coverage-reports/index.html">[4]</a>
for LLVM repo is published on a coverage bot.
Top-level indexing causes rendering scalability issues in large projects,
such as Fuchsia <a href="https://fuchsia.dev">[5]</a>.
The goal of this project is to generate a hierarchical directory structure in generated coverage html reports
to match the directory structure and solve scalability issues.
Chromium uses its own post-processing tools to show a per-directory hierarchical structure for coverage results
<a href="https://analysis.chromium.org/coverage/p/chromium">[6]</a>.
Similarly, Lcov, which is a graphical front-end Gcov<a href="https://gcc.gnu.org/onlinedocs/gcc/Gcov.html">[7]</a>,
provides a one-level directory structure to display coverage results <a href="https://llvm.org/reports/coverage/index.html">[8]</a>. <br>
[1] <a href="https://clang.llvm.org/docs/SourceBasedCodeCoverage.html">Source-based code coverage</a><br>
[2] <a href="https://llvm.org/docs/CommandGuide/llvm-profdata.html">llvm-profdata</a><br>
[3] <a href="https://llvm.org/docs/CommandGuide/llvm-cov.html">llvm-cov</a><br>
[4] <a href="https://lab.llvm.org/coverage/coverage-reports/index.html">LLVM coverage reports</a><br>
[5] <a href="https://fuchsia.dev">Fuchsia</a><br>
[6] <a href="https://analysis.chromium.org/coverage/p/chromium">Coverage summary for Chromium</a><br>
[7] <a href="https://gcc.gnu.org/onlinedocs/gcc/Gcov.html">Gcov</a><br>
[8] <a href="https://llvm.org/reports/coverage/index.html">Lcov coverage reports</a><br>
[9] <a href="https://github.com/llvm/llvm-project/issues/54711">Issue #54711: Support per-directory index files for HTML coverage report</a></p>
</p>
<p><b>Expected result:</b> Implement a support in hierarchical directory structure in generated coverage html reports and show the usage of this feature in LLVM repo code coverage reports.</p>
<p><b>Project size:</b> Medium or Large</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentors:</b>
<a href="https://github.com/gulfemsavrun">Gulfem Savrun Yeniceri</a>
<a href="https://github.com/petrhosek">Petr Hosek</a>
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/coverage-support-a-hierarchical-directory-structure-in-generated-coverage-html-reports/68239">URL</a>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_map_value_to_src_expr">Map LLVM values to corresponding source-level expressions</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project</b>
Developers often use compiler generated remarks and analysis reports to optimize their code. While
compilers in general are good at including source code positions (i.e line and column numbers) in the
generated messages, it is useful if these generated messages also include the corresponding source-level
expressions. The approach used by the LLVM implementation is to use a small set of intrinsic functions
to define a mapping between LLVM program objects and the source-level expressions. The goal of this
project is to use the information included within these intrinsic functions to either generate the
source expression corresponding to LLVM values or to propose and implement solutions to get the same if
the existing information is insufficient. Optimizing memory accesses in a program is important for
application performance. We specifically intend to use compiler analysis messages that report
source-level memory accesses corresponding to the LLVM load/store instructions that inhibit compiler
optimizations. As an example, we can use this information to report memory access dependences that
inhibit vectorization.
</p>
<p><b>Project size:</b> Medium</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Skills:</b> Intermediate C++, familiarity with LLVM core or willingness to learn the same.</p>
<p><b>Expected result:</b> Provide an interface which takes an LLVM value and returns a string corresponding
to the equivalent source-level expression. We are especially interested in using this interface to map
addresses used in load/store instructions to equivalent source-level memory references.</p>
<p><b>Confirmed Mentors:</b>
Satish Guggilla (satish.guggilla@intel.com)
Karthik Senthil (karthik.senthil@intel.com)
</p>
<p>
<b>Discourse:</b>
<a href="https://discourse.llvm.org/t/map-llvm-values-to-corresponding-source-level-expressions/68450">URL</a>
</p>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="clangir">Build and run SingleSource benchmarks using ClangIR</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b><br>
Clang codegen works by emitting LLVM IR using AST visitors. In the
<a href="https://llvm.github.io/clangir/">ClangIR</a> project, we emit ClangIR (CIR)
from AST visitors too (CIRGen), and then lower to (a) LLVM IR directly or, alternatively,
(b) MLIR in-tree dialects. Lowering to LLVM is still quite immature and lacks many
instructions, attributes and metadata support.
ClangIR would greatly benefit from some level of parity with Clang AST →
LLVM IR codegen quality, in both performance and build time. This is key
for incrementally bridging correctness and performance testing, providing a
baseline for future higher level optimizations on top of C/C++.
A good starting point is to build and run simple benchmarks,
measuring both generated code and build time performance. LLVM's llvm-test-suite contains scripts and
machinery that easily allows checking correctness and collecting perf related data and its
<a href="https://github.com/llvm/llvm-test-suite/tree/main/SingleSource">SingleSource</a>
collection provide a set of simpler programs to build.
In a nutshell, while working on this project the student will brigde the
gap of CIR → LLVM lowering, and at times fix any lacking Clang AST → CIR
support. The work is going to be done incrementally on top of SingleSource
benchmarks, while measuring compiler build time and the performance of compiled
programs.
</p>
<p><b>Skills:</b>
Intermediate C++ programming skills; familiarity with compilers, LLVM IR,
MLIR or Clang are a big plus, but willingness to learn is also a
possibility.
</p>
<p><b>Expected result:</b>Build and run programs from the SingleSource subdirectory from the
lvm-test-suite, collect and present results (perf and build time) against regular (upstream) clang codegen.</p>
<p><b>Project size:</b> Large</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Confirmed Mentors:</b>
<a href="https://github.com/bcardosolopes">Bruno Cardoso Lopes</a>
<a href="https://github.com/lanza">Nathan Lanza</a>
</p>
<p><b>Discourse:</b> <a href="https://discourse.llvm.org/t/clangir-build-and-run-singlesource-benchmarks-using-clangir/68473">URL</a>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="enzyme_tblgen_extension">Move additional Enzyme Rules to Tablegen</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
Enzyme performs automatic differentiation (in the calculus sense) of LLVM programs. This enables users to use Enzyme to perform various algorithms such as back-propagation in ML or scientific simulation on existing code for any language that lowers to LLVM. The support for an increasing number of LLVM Versions (7-main), AD modes (Reverse, Forward, Forward-Vector, Reverse-Vector, Jacobian), and libraries (BLAS, OpenMP, MPI, CUDA, ROCm, ...) leads to a steadily increasing code base. In order to limit complexity and help new contributors we would like to express more parts of our core logic using LLVM Tablegen. The applicant is free to decide how to best map the program transformation abstractions within Enzyme to Tablegen.
</p>
<p><b>Expected results:</b>
1. Extend the tablegen rule generation system within Enzyme to cover a new component beside of the AdjointGenerator
<br/>
2. Moving several existing rules to the new autogenerated system (e.g. LLVM instructions, LLVM intrinsics, MPI calls, ...
<br/>
</p>
<p><b>Confirmed mentor:</b>
<a href="https://github.com/zuseZ4">Manuel Drehwald</a>
<a href="mailto:wmoses@mit.edu">William Moses</a>
</p>
<p><b>Desirable skills:</b>
Good knowledge of C++, calculus, and LLVM and/or Clang, and/or MLIR internals. Experience with Tablegen, Enzyme or automatic differentiation would be nice, but can also be learned in the project.
</p>
<p><b>Project size:</b> Large</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Discourse</b> <a href="https://discourse.llvm.org/t/enzyme-move-additional-enzyme-rules-to-tablegen/69738">URL</a></p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_patch_coverage">Patch based test coverage for quick test feedback</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project</b>
Most of the day to day tests in LLVM are regression tests executed by <a href="https://llvm.org/docs/CommandGuide/lit.html">Lit</a>, structured as source code or IR to be passed to some binary, rather than test code directly calling the code to be tested.
This has many advantages but can make it difficult to predict which code path is executed when the compiler is invoked with a certain test input, especially for edge cases where error handling is involved.
The goal of this project is to help developers create good test coverage for their patch and enable reviewers to verify that they have done so.
To accomplish this we would like to introduce a tool that can be fed a patch as input, add coverage instrumentation for the affected source files, runs Lit tests, and records which test cases cause each counter to be executed.
For each counter we can then report the number of test cases executing the counter, but perhaps more importantly we can also report the number of test cases executing the counter that are also changed in some way by the patch, since a modified line that results in the same test results isn’t properly tested, unless it’s intended to be a non-functional change.
This can be implemented in three separate parts:
<ol>
<li>Adding an option to llvm-lit to emit the necessary test coverage data, divided per test case (involves setting a unique value to <a href="https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program"><code>LLVM_PROFILE_FILE</code></a> for each RUN)
<li>New tool to process the generated coverage data and the relevant git patch, and present the results in a user friendly manner
<li>Adding a way to non-intrusively (without changing build configurations) enable coverage instrumentation to a build. By building the project normally, touching the files changed by the patch, and rebuilding with <a href="https://github.com/llvm/llvm-project/blob/93a1fc2e18b452216be70f534da42f7702adbe1d/clang/tools/driver/driver.cpp#L79-L105"><code>CCC_OVERRIDE_OPTIONS</code></a> set to <a href="https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#compiling-with-coverage-enabled">add coverage</a> we can lower the overhead of generating and processing coverage of lines not relevant to the patch.
</ol>
The tooling in step 2 and 3 can be made completely agnostic of the actual test-runner, lowering the threshold for other test harnesses than Lit to implement the same functionality.
If time permits adding this as a step in CI would also be helpful for reviewers.
</p>
<p><b>Project size:</b> Small or medium</p>
<p><b>Difficulty:</b> Simple </p>
<p><b>Skills:</b> Python for Lit, data processing and <a href="https://www.gnu.org/software/diffutils/manual/html_node/Unified-Format.html">diff</a> processing. No compiler experience necessary. </p>
<p><b>Expected result:</b> Implement a new tool for use by the community. Developers get help finding uncovered edge cases during development, while also avoiding paranoid sprinkling of asserts or logs just to check that the code is actually executed. Reviewers can more easily check which parts of the patch are tested by each test. </p>
<p><b>Confirmed Mentors:</b>
<a href="https://github.com/hnrklssn">Henrik Olsson</a>
</p>
<p>
<b>Discourse:</b>
<a href="https://discourse.llvm.org/t/coverage-patch-based-test-coverage-for-quick-test-feedback/68628">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_sectiontitle">
<a name="gsoc22">Google Summer of Code 2022</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p>
Google Summer of Code 2022 was very successful for LLVM project. For the
list of accepted and completed projects, please take a look into Google
Summer of
Code <a href="https://summerofcode.withgoogle.com/archive/2022/organizations/llvm-compiler-infrastructure">website</a>.
</p>
<!-- *********************************************************************** -->
<div class="www_subsection">
<a>LLVM</a>
</div>
<!-- *********************************************************************** -->
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="llvm_shared_jitlink">Implement a shared-memory based JITLinkMemoryManager for out-of-process JITting</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
Write a shared-memory based JITLinkMemoryManager.
<br />
LLVM’s JIT uses the JITLinkMemoryManager interface to allocate both working
memory (where the JIT fixes up the relocatable objects produced by the
compiler) and target memory (where the JIT’d code will reside in the target).
JITLinkMemoryManager instances are also responsible for transporting
fixed-up code from working memory to target memory. LLVM has an existing
cross-process allocator that uses remote procedure calls (RPC) to allocate
and copy bytes to the target process, however a more attractive solution
(when the JIT and target process share the same physical memory) would be to
use shared memory pages to avoid copies between processes.
</p>
<p><b>Expected results:</b>
<ul>Implement a shared-memory based JITLinkMemoryManager:
<li>Write generic LLVM APIs for shared memory allocation.</li>
<li>
Write a JITLinkMemoryManager that uses these generic APIs to allocate
shared working-and-target memory.
</li>
<li>Make an extensive performance study of the approach.</li>
</ul>
<p><b>Confirmed Mentor:</b>
<a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
<a href=https://github.com/lhames>Lang Hames</a></p>
<p><b>Desirable skills:</b> Intermediate C++; Understanding of LLVM and the
LLVM JIT in particular; Understanding of virtual memory management APIs.
</p>
<p><b>Project type:</b> Large</p>
<p><b>Discourse</b>
<a href="https://discourse.llvm.org/t/implement-a-shared-memory-based-jitlinkmemorymanager-for-out-of-process-jitting">
URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="llvm_build_jit_tutorial">Modernize the LLVM "Building A JIT" tutorial series</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
The LLVM BuildingAJIT tutorial series teaches readers to build their own JIT
class from scratch using LLVM’s ORC APIs, however the tutorial chapters have
not kept pace with recent API improvements. Bring the existing tutorial
chapters up to speed, write up a new chapter on lazy compilation (chapter
code already available) or write a new chapter from scratch.
</p>
<p><b>Expected results:</b>
<ul>
<li>
Update chapter text for Chapters 1-3 -- Easy, but offers a chance to get
up-to-speed on the APIs.
</li>
<li>
Write chapter text for Chapter 4 -- Chapter code is already available,
but no chapter text exists yet.
</li>
<li>
Write a new chapter from scratch -- E.g. How to write an out-of-process
JIT, or how to directly manipulate the JIT'd instruction stream using
the ObjectLinkingLayer::Plugin API.
</li>
</ul>
<p><b>Confirmed Mentor:</b>
<a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
<a href=https://github.com/lhames>Lang Hames</a></p>
<p><b>Desirable skills:</b> Intermediate C++; Understanding of LLVM and the
LLVM JIT in particular; Familiarity with RST (reStructed Text); Technical
writing skills.
</p>
<p><b>Project type:</b> Medium</p>
<p><b>Discourse</b>
<a href="https://discourse.llvm.org/t/modernize-the-llvm-building-a-jit-tutorial-series">
URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="llvm_jit_new_format">Write JITLink support for a new format/architecture</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
JITLink is LLVM’s new JIT linker API -- the low-level API that transforms
compiler output (relocatable object files) into ready-to-execute bytes in
memory. To do this JITLink’s generic linker algorithm needs to be
specialized to support the target object format (COFF, ELF, MachO), and
architecture (arm, arm64, i386, x86-64). LLVM already has mature
implementations of JITLink for MachO/arm64 and MachO/x86-64, and a
relatively new implementation for ELF/x86-64. Write a JITLink implementation
for a missing target that interests you. If you choose to implement support
for a new architecture using the ELF or MachO formats then you will be able
to re-use the existing generic code for these formats. If you want to
implement support for a new target using the COFF format then you will need
to write both the generic COFF support code and the architecture support
code for your chosen architecture.
</p>
<p><b>Expected results:</b>
Write a JITLink specialization for a not-yet-supported format/architecture.
</p>
<p><b>Confirmed Mentor:</b>
<a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
<a href=https://github.com/weliveindetail>Stefan Gränitz</a>,
<a href=https://github.com/lhames>Lang Hames</a>
</p>
<p><b>Desirable skills:</b> Intermediate C++; Understanding of LLVM and the
LLVM JIT in particular; familiarity with your chosen format/architecture,
and basic linker concepts (e.g. sections, symbols, and relocations).
</p>
<p><b>Project type:</b> Large</p>
<p><b>Discourse</b>
<a href="https://discourse.llvm.org/t/write-jitlink-support-for-a-new-format-architecture">
URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="llvm_instrumentaion_for_compile_time">Instrumentation of Clang/LLVM for Compile Time</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
Every developer, at some point (usually while waiting for their program to
compile), has asked "Why is it taking so long?" This project is to seek an
answer to this question. There exists within LLVM, and by extension CLANG,
a timing infrastructure that records events within the compiler. However,
its utilization is inconsistent and insufficient. This can be improved by
adding more instrumentation throughout LLVM and CLANG but one must be careful.
Too much instrumentation, or instrumenting the wrong things, can be confusing
and overwhelming, thus making it no more useful than not enough information.
The trick is to find the right places to instrument and controlling the
instrumentation. Seeking out these key spots will take you through the
entire compilation process, from preprocessing through to final code
generation, and all phases between. As you instrument the code, you will
look at the data as you evolve it, which will further direct your search.
You will develop new ways to control and filter the information to allow a
better understanding of where the compiler is spending its time. You will
seek out and develop example test inputs that illustrate where the compiler
can be improved, which will in turn, help direct your instrumenting and search.
You will consider and develop ways of controlling the instrumentation to
allow better understanding and detailed examination of phases of compilation.
Through all of this, you will gain an understanding of how a compiler works,
from front end processing, through the LLVM optimization pipeline, through
to code generation. You will see, and understand, the big picture of what
is required to compile and optimize a C/C++ program, and in particular, how
CLANG, LLVM and LLC accomplish these tasks. Your mentors have a combined
experience of approximately 25 years of compiler development and around 8
years of experience with LLVM itself to help you on your quest.
</p>
<p><b>Expected results:</b>
<ul>
<li>Targetted expansion of the use of the existing timing infrastructure</li>
<li>Identification of appropriate test inputs for improving compile time</li>
<li>Identification of compile time hotspots</li>
<li>New and improved methods of controlling the timing infrastructure</li>
</ul>
</p>
<p><b>Confirmed Mentor:</b> Jamie Schmeiser, Whitney Tsang</p>
<p><b>Desirable skills:</b> C++ programming skills; CLANG/LLVM knowledge an asset but not necessary; self motivated; curiosity; desire to learn</p>
<p><b>Project type:</b>175 or 350 hour</p>
<p><b>Difficulty Rating:</b>Easy - Medium</p>
<p><b>Discourse</b>
<a href="https://discourse.llvm.org/t/instrumentation-of-clang-llvm-for-compile-time">
URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_mlgo_passes">Machine Learning Guided Ordering of Compiler Optimization Passes</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project</b>
This continues the work of GSoC 2020 and <a href="https://summerofcode.withgoogle.com/archive/2021/projects/6411038932598784">2021</a>.
Developers generally use standard optimization pipelines like -O2 and -O3 to
optimize their code. Manually crafted heuristics are used to determine which
optimization passes to select and how to order the execution of those passes.
However, this process is not tailored for a particular application, or kind
of application, as it is designed to perform “reasonably well” for any input.
We want to improve the existing heuristics or replace the heuristics with
machine learning-based models so that the LLVM compiler can provide a superior
order of the passes customized per application.
The last milestone enabled feature extraction, and started investigating training
a policy for selecting a more appropriate pass pipeline.
</p>
<p><b>Project size:</b> either 175 or 350 hr.</p>
<p><b>Difficulty:</b> Medium</p>
<p><b>Skills:</b> C/C++, some compiler experience. ML experience is a bonus.</p>
<p><b>Expected outcomes</b>: Pre-trained model selecting the most economical
optimization pipeline, with no loss in performance; hook-up of model in LLVM;
(re-)training tool.</p>
<p><b>Mentors</b>
Tarindu Jayatilaka, Mircea Trofin, Johannes Doerfert
</p>
<p>
<b>Discourse</b>
<a href="https://discourse.llvm.org/t/machine-learning-guided-ordering-of-compiler-optimization-passes/60415">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_mlgo_loop">Learning Loop Transformation Policies</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project</b>
This project is a continuation of last <a href="https://summerofcode.withgoogle.com/archive/2021/projects/5732097817313280">year’s</a>.
In 2021, the project achieved its first milestone - separating correctness
decisions from policy decisions. This opens up the possibility of replacing
the latter with machine-learned ones.
Rough milestones: 1) select an initial set of features and use the existing ML
Guided Optimizations (MLGO) infra to generate training logs; 2) define a reward
signal, computable at compile time, to guide a reinforcement learning training loop;
3) iterate through training and refine reward/feature set
</p>
<p><b>Project size:</b> either 175 or 350 hr, ideally 350 hr</p>
<p><b>Difficulty:</b> Medium/Hard</p>
<p><b>Skills:</b> C/C++, some compiler experience. ML experience is a bonus.</p>
<p><b>Expected outcomes</b>: policy ('advisor') interface for loop unrolling,
with current heuristic as default implementation; set up feature extraction
for reinforcement learning training; set up a reward metric; set up training
algorithm, and iterate over policy training</p>
<p><b>Mentors</b>
Johannes Doerfert, Mircea Trofin
</p>
<p>
<b>Discourse</b>
<a href="https://discourse.llvm.org/t/learning-loop-transformation-policies/60413">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_module_inliner">Evaluate and Expand the Module-Level Inliner</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project</b>
LLVM's inliner is a bottom-up, strongly-connected component-level pass. This
places limits on the order in which call sites are evaluated, which impacts
the effectiveness of inlining.
We now have a functional Module Inliner, as result of <a href="https://summerofcode.withgoogle.com/archive/2021/projects/5195658885070848">GSoC2021 work</a>.
We want to call site priority schemes, effectiveness/frequency of running
function passes after successful inlinings, interplay with the ML inline
advisor, to name a few areas of exploration.
</p>
<p><b>Project size:</b> either 175 or 350 hr, ideally 350 hr, milestones allow
for 175hr scoping</p>
<p><b>Difficulty:</b> Medium/Hard</p>
<p><b>Skills:</b> C/C++, some compiler experience.</p>
<p><b>Expected outcomes</b>: Proposal and Evaluation of alternative traversal
orders; evaluation of 'clustering' inlining decisions (inline more than one
call site at a time); evaluation of effectiveness/frequency of function
optimization passes after inlining
</p>
<p><b>Mentors</b>
Kazu Hirata, Liqiang Tao, Mircea Trofin
</p>
<p>
<b>Discourse</b>
<a href="https://discourse.llvm.org/t/evaluate-and-expand-the-module-level-inliner/60525">URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_lto_dependency_info">Richer symbol dependency information for LTO</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project:</b>
C and C++ programs are often composed of various object files produced from
separately-compiled source files that are then linked together.
When compiling one source file, knowledge that can be derived from the logic
contained within the other source files would normally not be available.
Link-time optimization, also known as LTO, is a way for optimization to be
done using information from more than one source file.
</p>
<p>In LLVM, LTO is achieved by using LLVM bitcode objects as the output from
the "compile" step and feeding those objects into the link step.
LLVM's LTO operates in conjunction with the linker.
The linker is invoked by the user and the linker in turn drives LLVM's LTO
when it encounters LLVM bitcode files, getting information from LTO about
what symbols a bitcode object defines or references.
Information about what symbols are defined in or referenced from an object
is necessary for the linker to perform symbol resolution, and a linker is
normally able to extract such information from regular (non-bitcode) object
files.
</p>
<p>The implied consequences of LLVM's LTO implementation
with respect to linker GC
(linker garbage collection) can be improved, especially for aggressive forms
of linker GC with lazy inclusion of objects and sections.
In particular, the symbols referenced but undefined by an LTO module are,
to the linker, monolithic at the module level.
At the same time, the symbols referenced but undefined by regular
(non-LTO) objects are monolithic to LTO.
Together, this means that the inclusion of an LTO module
into the overall process potentially leads, in the linker's initial symbol
resolution, to all the undefined symbols in that module being considered as
referenced; in turn, additional artifacts (e.g., archive members) may be
added into the resolution, which further leads to references that may
resolve to symbols defined in LTO modules and a premature conclusion that
the definition of these symbols are needed.
This at least means potentially unnecessary codegen is being done for
functions that will be garbage-collected in the end (waste of electricity
and time).
</p>
<p>We acknowledge that an ideal implementation probably involves a "coroutine"
like interaction between the linker and LTO codegen where information flows
back and forth; however, such an endeavour is invasive to both linkers and
to LLVM.
</p>
<p>We believe that by</p>
<ul>
<li>having the linker register, via an API to LTO, symbol reference "nodes"
modelling the relationship between a symbol and the symbols that are
referenced in turn from (the object file section containing) its
linker-selected definition, and
</li>
<li>using that information in LTO processing,</li>
</ul>
<p>the LTO processing will be able to effectively identify a more accurate set
of LTO symbols that are visible outside of the LTO unit.
The linker merely needs to identify only exported symbols and entry points
(such as the entry point for an executable and functions involved in
initialization and finalization).
</p>
<p>Having the LLVM opt/codegen understand the dependency implications from the
"outside world" is strictly better than the other direction: the symbols
referred to by relocations in non-LTO code are pretty much fixed as compiled
(whereas some references in LTO code may disappear with optimization).
</p>
<p><b>Expected results:</b></p>
<ol>
<li>Modification of the C++ LTO interface used by LLD to implement an
interface to record the symbol reference dependency data (incorporating
awareness of sections and comdats). This may additionally include a
method to add LTO objects provisionally, simulating behaviours where
linkers only add objects as needed.
</li>
<li>
Modification of LTO to use new symbol reference information
for definitions in regular objects when visiting definitions
in the IR prior to the
internalization pass to discover (transitive) symbol references and
record the so-referenced symbols as being visible to regular objects.
This may additionally include the "late" incorporation of LTO objects
added provisionally into the merged LTO module.
</li>
<li>Modification of LLD (for ELF) to modify initial resolution to use the
new interface as a replacement for setting
<code>VisibleToRegularObj</code>
except for entry point functions (including C++ dynamic initialization and
finalization).
</li>
</ol>
<p><b>Confirmed Mentors:</b>
Sean Fertile,
Hubert Tong,
Wael Yehia
</p>
<p><b>Desirable skills:</b>
Intermediate C++;
basic linker concepts (e.g., symbols, sections, and relocations)
</p>
<p><b>Project size:</b> 350 hours</p>
<p><b>Difficultly:</b> Medium/Hard</p>
<p><b>Discourse</b>
<a href="https://discourse.llvm.org/t/richer-symbol-dependency-information-for-lto/60335">
URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_undef_load">Remove undef: move uninitialized memory to poison</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project</b>
The existence of the undef value in LLVM prevents several optimizations,
even in programs where it is not used. Therefore, we have been trying to
move all uses of undef to poison so we can eventually remove undef from
LLVM.<br/>
This project focuses on uninitialized memory: right now the semantics of
LLVM is that loading a value from uninitilized memory yields an undef value.
This prevents, for example, SROA/mem2reg from optimizing conditional loads
as phi(undef, %x) cannot be replaced with x, as %x might be poison.<br/>
This project consists in devising a consistent semantics for uninitialized
(based on existing proposals), an upgrade plan for LLVM, and implementing
the changes in LLVM and clang.
In clang the changes should be specific to bit-fields.<br/>
For more information see the following
<a href="https://github.com/llvm/llvm-project/issues/52930">discussion</a>
and/or contact the mentor.<br/>
Further reading:
<a href="https://web.ist.utl.pt/nuno.lopes/pubs/llvmmem-oopsla18.pdf">introduction to LLVM's memory model</a>.
</p>
<p><b>Project size:</b> 350 hr</p>
<p><b>Difficulty:</b> Medium/Hard</p>
<p><b>Skills:</b> Intermediate C++</p>
<p><b>Expected outcomes</b>:
<ul>
<li>Semantics for memory operations that removes the need for undef
values</li>
<li>Upgrade plan for LLVM and frontends</li>
<li>Implementation of the proposed semantics in LLVM</li>
<li>Implementation of auto-upgrade path for old LLVM IR files</li>
<li>Implementation of fixes in clang to use the new IR features</li>
<li>Benchmarking to check for regressions and/or perf improvements</li>
</ul>
</p>
<p><b>Mentors:</b>
<a href="https://web.ist.utl.pt/nuno.lopes/">Nuno Lopes</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a id="llvm_abi_export">Add API/ABI export annotations to the LLVM build</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project</b>
<p>Currently, all libraries inside LLVM export all their symbols publicly. When
linking statically against them, the linker will remove unused symbols and this
is not a problem.</p>
<p>When the libraries are built as shared libraries however, the number of exported
symbols is very large and symbols that are meant to be internal spill into the
public ABI of the shared libLLVM.so.</p>
<p>In this project, we’d like to change the default visibility of library symbols
to “hidden”, add an annotation macro to LLVM and use the macro to gradually move
the entire library in this direction. This will eventually enable building the
shared libLLVM.so on Windows as well.<p>
<p>In practice, this means adding -fvisibility=hidden to individual libraries and
annotating exported symbols with the LLVM export annotation.</p>
<p>We would like this work to be as unintrusive into other developer’s workflow as
possible, so starting with a small internal library would be beneficial,
e.g. one of the LLVM targets or IR passes.</p>
<p>For further reading, there is a Discourse thread avaiable that discusses the
idea behind this proposal:
<a href="https://discourse.llvm.org/t/supporting-llvm-build-llvm-dylib-on-windows/58891">
Supporting LLVM_BUILD_LLVM_DYLIB on Windows</a>
as well as the linked Phabricator review with a patch implementing the functionality:
<a href="https://reviews.llvm.org/D109192">⚙ D109192 [WIP/DNM] Support:
introduce public API annotation support</a>
None of this work has been committed yet but can be used as a starting point
for this proposal.</p>
</p>
<p><b>Project size:</b> Medium</p>
<p><b>Difficulty:</b> Easy</p>
<p><b>Skills:</b> Build systems, CMake, LLVM</p>
<p><b>Expected outcomes</b>:
<ul>
<li>Export macro implemented and commited to LLVM</li>
<li>At least one internal target ported to the new export scheme</li>
</ul>
</p>
<p><b>Mentors:</b>
Timm Bäder, Tom Stellard
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsection">
<a>Clang</a>
</div>
<!-- *********************************************************************** -->
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-template-instantiation-sugar">Extend clang AST to provide
information for the type as written in template instantiations.</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project: </b>
When instantiating a template, the template arguments are canonicalized
before being substituted into the template pattern. Clang does not preserve
type sugar when subsequently accessing members of the instantiation.
<pre>
std::vector&lt;std::string&gt; vs;
int n = vs.front(); // bad diagnostic: [...] aka 'std::basic_string&lt;char&gt;' [...]
template&lt;typename T&gt; struct Id { typedef T type; };
Id&lt;size_t&gt;::type // just 'unsigned long', 'size_t' sugar has been lost
</pre>
Clang should "re-sugar" the type when performing member access on a class
template specialization, based on the type sugar of the accessed
specialization. The type of vs.front() should be std::string, not
std::basic_string&lt;char, [...]&gt;.
<br /> <br />
Suggested design approach: add a new type node to represent template
argument sugar, and implicitly create an instance of this node whenever a
member of a class template specialization is accessed. When performing a
single-step desugar of this node, lazily create the desugared representation
by propagating the sugared template arguments onto inner type nodes (and in
particular, replacing Subst*Parm nodes with the corresponding sugar). When
printing the type for diagnostic purposes, use the annotated type sugar to
print the type as originally written.
<br /> <br />
For good results, template argument deduction will also need to be able to
deduce type sugar (and reconcile cases where the same type is deduced twice
with different sugar).
</p>
<p><b>Expected results: </b>
Diagnostics preserve type sugar even when accessing members of a template
specialization. T&lt;unsigned long&gt; and T&lt;size_t&gt; are still the
same type and the same template instantiation, but
T&lt;unsigned long&gt;::type single-step desugars to 'unsigned long' and
T&lt;size_t&gt;::type single-step desugars to 'size_t'.</p>
<p><b>Confirmed Mentor:</b>
<a href=https://github.com/vgvassilev>Vassil Vassilev</a>,
<a href=https://github.com/zygoloid>Richard Smith</a></p>
<p><b>Desirable skills:</b>
Good knowledge of clang API, clang's AST, intermediate knowledge of C++.
</p>
<p><b>Project type:</b> Large</p>
<p><b>Discourse</b>
<a href="https://discourse.llvm.org/t/clang-extend-clang-ast-to-provide-information-for-the-type-as-written-in-template-instantiations">
URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-sa-structured-bindings">Implement support for
C++17 structured bindings in the Clang Static Analyzer</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description of the project: </b>
Even though a lot of new C++ features are supported by the static analyzer
automatically by the virtue of clang AST doing all the work under the hood,
the C++17 "structured binding" syntax
<pre> auto [x, y] = ...;</pre>
requires some extra work on the Static Analyzer side. The analyzer's transfer functions
need to be taught about the new AST nodes, <a href="https://clang.llvm.org/doxygen/classclang_1_1BindingDecl.html">BindingDecl</a>
and <a href="https://clang.llvm.org/doxygen/classclang_1_1DecompositionDecl.html">DecompositionDecl</a>,
to work correctly in all <a href="https://en.cppreference.com/w/cpp/language/structured_binding">three interpretations</a>
described by the Standard.
<br /><br />
Incomplete support for structured bindings is a common source of
false positives in the uninitialized variable checker on modern C++ code,
such as <a href="https://github.com/llvm/llvm-project/issues/42387">#42387</a>.
<br /><br />
It is likely that the Clang CFG also needs to be updated. Such changes in
the CFG may improve quality of clang warnings outside of
the Static Analyzer.
</p>
<p><b>Expected results: </b>
The Static Analyzer correctly models structured binding and decomposition
declarations. In particular, binding variables no longer appear
uninitialized to the Static Analyzer's uninitialized variable checker.</p>
<p><b>Confirmed Mentor:</b>
<a href=https://github.com/haoNoQ>Artem Dergachev</a>,
<a href=https://github.com/t-rasmud>Rashmi Mudduluru</a>,
<a href=https://github.com/xazax-hun>Gábor Horváth</a>,
<a href=https://github.com/Szelethus>Kristóf Umann</a>
</p>
<p><b>Desirable skills:</b>
Intermediate knowledge of C++. Some familiarity with Clang AST and/or
some static analysis background.
</p>
<p><b>Project size:</b> 350 hr</p>
<p><b>Difficulty:</b> Medium/Hard</p>
<p><b>Discourse</b>
<a href="https://discourse.llvm.org/t/implement-support-for-c-17-structured-bindings-in-the-clang-static-analyzer/60588">
URL</a>
</p>
</div>
<!-- *********************************************************************** -->
<div class="www_subsubsection">
<a name="clang-improve-diagnostics">Improve Clang Diagnostics.</a>
</div>
<!-- *********************************************************************** -->
<div class="www_text">
<p><b>Description:</b>
Clang Diagnostics, which issues Warnings and Errors to the programmer, are a critical
feature of the compiler. Great diagnostics can have a significant impact on the
user experience of the compiler and increase their productivity.
</p>
<p>Recent improvements in GCC
<a href="https://developers.redhat.com/blog/2018/03/15/gcc-8-usability-improvements"> [1] </a>
<a href="https://developers.redhat.com/blog/2019/03/08/usability-improvements-in-gcc-9/"> [2] </a>
shows that there is significant headroom to improve diagnostics
(and user interactions in general). It would be a very impactful project
to survey and identify all the possible improvements to clang on this
topic and start redesigning the next generation of our diagnostics.
</p>
<p>
In addition, we will also make conclusions on issues reported on the LLVM Github Issue page labeled
with <a href="https://github.com/llvm/llvm-project/labels/clang%3Adiagnostics"> clang-diagnostics</a>
and if they need fixing, we will prepare patches otherwise simply close them.
</p>
<p><b>Expected outcomes</b>:
Diagnostics will be improved:
<ul>
<li>Improve diagnostic aesthetics</li>
<li>Cover missing diagnostics</li>
<li>Reduce false positive rate</li>
<li>Reword diagnostics</li>
</ul>
</p>
<p><b>Confirmed Mentor:</b>
<a href