[tsan] Introduce Adaptive Delay Scheduling to TSAN (#178836)

This commit introduces an "adaptive delay" feature to the
ThreadSanitizer runtime to improve race detection by perturbing thread
schedules. At various synchronization points (atomic operations,
mutexes, and thread lifecycle events), the runtime may inject small
delays (spin loops, yields, or sleeps) to explore different thread
interleavings and expose data races that would otherwise occur only in
rare execution orders.

This change is inspired by prior work, which is discussed in more detail
on

https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969.
In short, https://reviews.llvm.org/D65383 was an earlier unmerged
attempt at adding a random delays. Feedback on the RFC led to the
version in this commit, aiming to limit the amount of delay.

The adaptive delay feature uses a configurable time budget and tiered
sampling strategy to balance race exposure against performance impact.
It prioritizes high-value synchronization points with clear
happens-before relationships: relaxed atomics receive lightweight spin
delays with low sampling, synchronizing atomics (acquire / release /
seq_cst) receive moderate delays with higher sampling, and mutex and
thread lifecycle operations receive the longest delays with highest
sampling.

The feature is disabled by default and incurs minimal overhead when not
enabled. Nearly all checks are guarded by an inline check on a global
variable that is only set when enable_adaptive_delay=1. Microbenchmarks
with tight loops of atomic operations showed no meaningful performance
difference between an unmodified TSAN runtime and this version when
running with empty TSAN_OPTIONS.

An LLM assisted in writing portions of the adaptive delay logic,
including the TimeBudget class, tiering concept, address sampler, and
per-thread quota system. I reviewed the output and made amendments to
reduce duplication and simplify the behavior. I also replaced the LLM's
original double-based calculation logic with the integer-based Percent
class. The LLM also helped write unit test cases for Percent.

cc @dvyukov

## Examples

I used the delay scheduler to find novel bugs that rarely or never
occurred with the unmodified TSAN runtime. Some of the bugs below were
found with earlier versions of the delay scheduler that I iterated on,
but with this most recent implementation in this PR, I can still find
the bugs far more reliably than with the standard TSAN runtime.

- A use-after-free in the
[BlazingMQ](https://github.com/bloomberg/blazingmq) broker during
ungraceful producer disconnect.
 - Race in stdexec: https://github.com/NVIDIA/stdexec/pull/1395
- Race in stdexec's MPSC queue:
https://github.com/NVIDIA/stdexec/pull/1812
- A few races in [BDE](https://github.com/bloomberg/bde) thread enabled
data structures/algorithms.
- The "Data race on variable a" test from
https://ceur-ws.org/Vol-2344/paper9.pdf is more reliably reproduced with
more aggressive adaptive scheduler options

# Outstanding work

- The
[RFC](https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969)
suggests moving the scheduler to sanitizer_common, so that ASAN can
leverage this. This should be done (should it be done in this PR?).
 - Missing interceptors for libdispatch

GitOrigin-RevId: a591a44653eabdb7ec0c158019f97c9b1547a5ea
13 files changed