[compiler-rt] Use __atomic builtins whenever possible

The code in this file dates back to 2012 when Clang's support for atomic
builtins was still quite limited. The bugs referenced in the comment
at the top of the file have long been fixed and using the compiler
builtins directly should now generate slightly better code.
Additionally, this allows using the atomic builtin header for platforms
where the __sync_builtins are lacking (e.g. Arm Morello).

This change does not introduce any code generation changes for
__tsan_read*/__tsan_write* or __tsan_func_{entry,exit} on x86, which
indicates the previously noted compiler issues have been fixed.

We also have to touch the non-clang codepaths here since the only way we
can make this work easily is by making the memory_order enum match the
compiler-provided macros, so we have to update the debug checks that
assumed the enum was always a bitflag.

The one downside of this change is that 32-bit MIPS now definitely
requires libatomic (but that may already have been needed for RMW ops).

Reviewed By: dvyukov

Pull Request: https://github.com/llvm/llvm-project/pull/84439

GitOrigin-RevId: abd5e45a96954d80f6ffe6d8676c0059fae8573b
7 files changed