[mlir][xegpu] Add support for `vector.multi_reduction` and `vector.shape_cast` SIMT distribution.  (#157560)

Add support for distributing the `vector.multi_reduction` operation
across lanes in a warp. Currently only 2D to 1D reductions are
supported. Given layouts for the source and accumulator vectors,
* If the reduction dimension is distributed across lanes, the reduction
is non-lane-local and the reduction is done using warp shuffles. Here we
simply rewrite the `MultiDimReductionOp` to a sequence of `ReductionOp`s
inside the warp op body. Actual distribution will be done by
`WarpOpReduction` pattern.
* If the reduction dimension is not distributed across lanes, the
reduction is lane-local. In this case, we yield the source and
accumulator vectors from the warp op and perform the lane-local
reduction outside the warp op using a sequence of `ReductionOp`s.

PR also adds support for distributing `vector.shape_cast` based on
layouts.

GitOrigin-RevId: 9b0d7ddb04665e76cfa90b5d69c6183b90772243
8 files changed
tree: 612f70484bb5377620baaccd880528ed9ae87a51
  1. benchmark/
  2. cmake/
  3. docs/
  4. examples/
  5. include/
  6. lib/
  7. python/
  8. test/
  9. tools/
  10. unittests/
  11. utils/
  12. .clang-format
  13. .clang-tidy
  14. CMakeLists.txt
  15. LICENSE.TXT
  16. Maintainers.md
  17. README.md
README.md

Multi-Level Intermediate Representation

See https://mlir.llvm.org/ for more information.