[mlir] Optimize ThreadLocalCache by removing atomic bottleneck (attempt #3) (#93315) The ThreadLocalCache implementation is used by the MLIRContext (among other things) to try to manage thread contention in the StorageUniquers. There is a bunch of fancy shared pointer/weak pointer setups that basically keeps everything alive across threads at the right time, but a huge bottleneck is the `weak_ptr::lock` call inside the `::get` method. This is because the `lock` method has to hit the atomic refcount several times, and this is bottlenecking performance across many threads. However, all this is doing is checking whether the storage is initialized. Importantly, when the `PerThreadInstance` goes out of scope, it does not remove all of its associated entries from the thread-local hash map (it contains dangling `PerThreadInstance *` keys). The `weak_ptr` also allows the thread local cache to synchronize with the `PerThreadInstance`'s destruction: 1. if `ThreadLocalCache` destructs, the `weak_ptr`s that reference its contained values are immediately invalidated 2. if `CacheType` destructs within a thread, any entries still live are removed from the owning `PerThreadInstance`, and it locks the `weak_ptr` first to ensure it's kept alive long enough for the removal. This PR changes the TLC entries to contain a `shared_ptr<ValueT*>` and a `weak_ptr<PerInstanceState>`. It gives the `PerInstanceState` entries a `weak_ptr<ValueT*>` on top of the `unique_ptr<ValueT>`. This enables `ThreadLocalCache::get` to check if the value is initialized by dereferencing the `shared_ptr<ValueT*>` and check if the contained pointer is null. When `PerInstanceState` destructs, the values inside the TLC are written to nullptr. The TLC uses the `weak_ptr<PerInstanceState>` to satisfy (2). (1) is no longer the case. When `ThreadLocalCache` begins destruction, the `weak_ptr<PerInstanceState>` are invalidated, but not the `shared_ptr<ValueT*>`. This is OK: because the overall object is being destroyed, `::get` cannot get called and because the `shared_ptr<PerInstanceState>` finishes destruction before freeing the pointer, it cannot get reallocated to another `ThreadLocalCache` during destruction. I.e. the values inside the TLC associated with a `PerInstanceState` cannot be read during destruction. The most important thing is to make sure destruction of the TLC doesn't race with the destructor of `PerInstanceState`. Because `PerInstanceState` carries `weak_ptr` references into the TLC, we guarantee to not have any use-after-frees.
Welcome to the LLVM project!
This repository contains the source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments.
The LLVM project has multiple components. The core of the project is itself called “LLVM”. This contains all of the tools, libraries, and header files needed to process intermediate representations and convert them into object files. Tools include an assembler, disassembler, bitcode analyzer, and bitcode optimizer.
C-like languages use the Clang frontend. This component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode -- and from there into object files, using LLVM.
Other components include: the libc++ C++ standard library, the LLD linker, and more.
Consult the Getting Started with LLVM page for information on building and running LLVM.
For information on how to contribute to the LLVM project, please take a look at the Contributing to LLVM guide.
Join the LLVM Discourse forums, Discord chat, LLVM Office Hours or Regular sync-ups.
The LLVM project has adopted a code of conduct for participants to all modes of communication within the project.