[libcxx/variant] Introduce `switch`-based mechanism for `std::visit`.

This patch introduces mechanism for `std::visit` backed by `switch`.
The `switch` is structured such that it's a flattened manual vtable (an n-ary array).
The `switch` mechanism is enabled if `(1 * ... * vs.size()) < 1024`.

The following are performance numbers from the benchmarks added in D85419, tested on my 2017 Macbook Pro.

```
$ ./projects/libcxx/benchmarks/variant_visit_1.libcxx.out
2020-08-09 23:55:14
Running ./projects/libcxx/benchmarks/variant_visit_1.libcxx.out
Run on (8 X 3100 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 8388K (x1)
Load Average: 2.03, 2.36, 2.43
------------------------------------------------------------
Benchmark                 Time             CPU   Iterations
------------------------------------------------------------
BM_Visit<1, 1>        0.260 ns        0.260 ns   1000000000
BM_Visit<1, 2>         1.56 ns         1.56 ns    435925220
BM_Visit<1, 3>         1.55 ns         1.55 ns    444416228
BM_Visit<1, 4>         1.57 ns         1.57 ns    427951336
BM_Visit<1, 5>         1.57 ns         1.56 ns    444766371
BM_Visit<1, 6>         1.70 ns         1.68 ns    446639358
BM_Visit<1, 7>         1.64 ns         1.64 ns    400441630
BM_Visit<1, 8>         1.56 ns         1.56 ns    430729471
BM_Visit<1, 9>         1.58 ns         1.58 ns    449894596
BM_Visit<1, 10>        1.54 ns         1.54 ns    449660506
BM_Visit<1, 20>        1.56 ns         1.56 ns    450813074
BM_Visit<1, 30>        1.59 ns         1.59 ns    440032940
BM_Visit<1, 40>        1.59 ns         1.59 ns    443731656
BM_Visit<1, 50>        1.56 ns         1.56 ns    444709859
BM_Visit<1, 60>        1.59 ns         1.58 ns    439527320
BM_Visit<1, 70>        1.57 ns         1.57 ns    438450890
BM_Visit<1, 80>        1.58 ns         1.58 ns    443001525
BM_Visit<1, 90>        1.63 ns         1.62 ns    448456349
BM_Visit<1, 100>       1.57 ns         1.57 ns    445740630

$ ./projects/libcxx/benchmarks/variant_visit_2.libcxx.out
2020-08-09 23:59:35
Running ./projects/libcxx/benchmarks/variant_visit_2.libcxx.out
Run on (8 X 3100 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 8388K (x1)
Load Average: 1.40, 1.94, 2.22
-----------------------------------------------------------
Benchmark                Time             CPU   Iterations
-----------------------------------------------------------
BM_Visit<2, 1>       0.261 ns        0.260 ns   1000000000
BM_Visit<2, 2>        1.55 ns         1.54 ns    432844219
BM_Visit<2, 3>        1.30 ns         1.30 ns    532529974
BM_Visit<2, 4>        1.54 ns         1.54 ns    446055910
BM_Visit<2, 5>        1.31 ns         1.31 ns    531099680
BM_Visit<2, 6>        1.56 ns         1.56 ns    443203475
BM_Visit<2, 7>        1.29 ns         1.29 ns    526478087
BM_Visit<2, 8>        1.56 ns         1.56 ns    439000834
BM_Visit<2, 9>        1.30 ns         1.30 ns    528756817
BM_Visit<2, 10>       1.56 ns         1.55 ns    442923039
BM_Visit<2, 20>       1.35 ns         1.35 ns    517021072
BM_Visit<2, 30>       1.60 ns         1.59 ns    419724661
BM_Visit<2, 40>       1.45 ns         1.44 ns    472137163
BM_Visit<2, 50>       1.65 ns         1.65 ns    421389743

$ ./projects/libcxx/benchmarks/variant_visit_3.libcxx.out
2020-08-10 00:01:32
Running ./projects/libcxx/benchmarks/variant_visit_3.libcxx.out
Run on (8 X 3100 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 8388K (x1)
Load Average: 2.20, 2.01, 2.21
-----------------------------------------------------------
Benchmark                Time             CPU   Iterations
-----------------------------------------------------------
BM_Visit<3, 1>       0.272 ns        0.271 ns   1000000000
BM_Visit<3, 2>        1.87 ns         1.86 ns    361858090
BM_Visit<3, 3>        1.77 ns         1.77 ns    391192579
BM_Visit<3, 4>        1.84 ns         1.84 ns    374694223
BM_Visit<3, 5>        1.75 ns         1.75 ns    408270392
BM_Visit<3, 6>        1.88 ns         1.88 ns    378759185
BM_Visit<3, 7>        1.79 ns         1.79 ns    395498102
BM_Visit<3, 8>        1.85 ns         1.85 ns    371660366
BM_Visit<3, 9>        1.80 ns         1.80 ns    386872851
BM_Visit<3, 10>       1.84 ns         1.84 ns    362367606
BM_Visit<3, 15>       1.77 ns         1.77 ns    392060220
BM_Visit<3, 20>       1.85 ns         1.85 ns    379157188
```

```
$ ./projects/libcxx/benchmarks/variant_visit_1.libcxx.out
2020-08-10 00:05:57
Running ./projects/libcxx/benchmarks/variant_visit_1.libcxx.out
Run on (8 X 3100 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 8388K (x1)
Load Average: 2.27, 2.36, 2.34
------------------------------------------------------------
Benchmark                 Time             CPU   Iterations
------------------------------------------------------------
BM_Visit<1, 1>        0.271 ns        0.271 ns   1000000000
BM_Visit<1, 2>        0.269 ns        0.269 ns   1000000000
BM_Visit<1, 3>        0.271 ns        0.271 ns   1000000000
BM_Visit<1, 4>        0.270 ns        0.270 ns   1000000000
BM_Visit<1, 5>        0.269 ns        0.269 ns   1000000000
BM_Visit<1, 6>        0.270 ns        0.269 ns   1000000000
BM_Visit<1, 7>        0.265 ns        0.265 ns   1000000000
BM_Visit<1, 8>        0.269 ns        0.269 ns   1000000000
BM_Visit<1, 9>        0.268 ns        0.268 ns   1000000000
BM_Visit<1, 10>       0.269 ns        0.269 ns   1000000000
BM_Visit<1, 20>       0.267 ns        0.267 ns   1000000000
BM_Visit<1, 30>       0.272 ns        0.272 ns   1000000000
BM_Visit<1, 40>       0.268 ns        0.268 ns   1000000000
BM_Visit<1, 50>       0.268 ns        0.268 ns   1000000000
BM_Visit<1, 60>       0.268 ns        0.268 ns   1000000000
BM_Visit<1, 70>       0.269 ns        0.269 ns   1000000000
BM_Visit<1, 80>       0.266 ns        0.266 ns   1000000000
BM_Visit<1, 90>       0.268 ns        0.268 ns   1000000000
BM_Visit<1, 100>      0.267 ns        0.267 ns   1000000000

$ ./projects/libcxx/benchmarks/variant_visit_2.libcxx.out
2020-08-12 04:09:59
Running ./projects/libcxx/benchmarks/variant_visit_2.libcxx.out
Run on (8 X 3100 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 8388K (x1)
Load Average: 2.17, 4.20, 4.78
-----------------------------------------------------------
Benchmark                Time             CPU   Iterations
-----------------------------------------------------------
BM_Visit<2, 1>       0.302 ns        0.301 ns   1000000000
BM_Visit<2, 2>       0.297 ns        0.295 ns   1000000000
BM_Visit<2, 3>       0.353 ns        0.351 ns   1000000000
BM_Visit<2, 4>       0.276 ns        0.276 ns   1000000000
BM_Visit<2, 5>       0.285 ns        0.283 ns   1000000000
BM_Visit<2, 6>       0.290 ns        0.287 ns   1000000000
BM_Visit<2, 7>       0.282 ns        0.280 ns   1000000000
BM_Visit<2, 8>       0.290 ns        0.287 ns   1000000000
BM_Visit<2, 9>       0.291 ns        0.285 ns   1000000000
BM_Visit<2, 10>      0.293 ns        0.287 ns   1000000000
BM_Visit<2, 20>       1.70 ns         1.68 ns    391400375
BM_Visit<2, 30>       1.64 ns         1.63 ns    418925874
BM_Visit<2, 40>       1.63 ns         1.62 ns    423623677
BM_Visit<2, 50>       1.68 ns         1.67 ns    411687212

$ ./projects/libcxx/benchmarks/variant_visit_3.libcxx.out
2020-08-12 04:10:43
Running ./projects/libcxx/benchmarks/variant_visit_3.libcxx.out
Run on (8 X 3100 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 8388K (x1)
Load Average: 1.57, 3.76, 4.59
-----------------------------------------------------------
Benchmark                Time             CPU   Iterations
-----------------------------------------------------------
BM_Visit<3, 1>       0.271 ns        0.270 ns   1000000000
BM_Visit<3, 2>       0.344 ns        0.334 ns   1000000000
BM_Visit<3, 3>       0.347 ns        0.336 ns   1000000000
BM_Visit<3, 4>       0.300 ns        0.296 ns   1000000000
BM_Visit<3, 5>       0.290 ns        0.286 ns   1000000000
BM_Visit<3, 6>       0.272 ns        0.271 ns   1000000000
BM_Visit<3, 7>        1.72 ns         1.71 ns    415765841
BM_Visit<3, 8>        1.73 ns         1.72 ns    408909555
BM_Visit<3, 9>        2.16 ns         2.04 ns    380898485
BM_Visit<3, 10>       2.45 ns         2.40 ns    295714256
BM_Visit<3, 15>       1.92 ns         1.85 ns    375990332
BM_Visit<3, 20>       1.66 ns         1.65 ns    414456233
```

Differential Revision: https://reviews.llvm.org/D85420

GitOrigin-RevId: a175a96517c5d9dc05ba13a6481b1b031a53a22f
2 files changed