[mlir][SCF] `scf.parallel`: Make reductions part of the terminator (#75314)

This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.

Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
    -> f32, f32 {
  %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
  %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
  scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.addf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }, {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.mulf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }
}
```

`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)

GitOrigin-RevId: 10056c821a56a19cef732129e4e0c5883ae1ee49
26 files changed
tree: fb44da0bdf023a8409b7c08f022ec2c84a97c1d2
  1. benchmark/
  2. cmake/
  3. docs/
  4. examples/
  5. include/
  6. lib/
  7. python/
  8. test/
  9. tools/
  10. unittests/
  11. utils/
  12. .clang-format
  13. .clang-tidy
  14. CMakeLists.txt
  15. LICENSE.TXT
  16. README.md
README.md

Multi-Level Intermediate Representation

See https://mlir.llvm.org/ for more information.