[MLIR][XeGPU] Support partial subgroup lane distribution  (#201667)

for convert_layout

Add lowering support in XeGPUSgToLaneDistribute for values that are
distributed across only a fraction of the subgroup.

- SgToLaneConvertLayout now lowers a rank-2 xegpu.convert_layout that
  shrinks the lane layout along the outer (distributed) dimension while
  keeping lane_data unchanged (e.g. [16, 1] -> [8, 1]). The partial-subgroup
  case is detected directly in the pattern: equal order, rank 2, unit inner
  lane layout, and a genuinely distributed outer lane layout (> 1, which also
  rules out the degenerate [1, 1] layout). Because the data is no longer
  replicated in every lane, it is gathered across lanes and the distributed
  outer dimension is doubled when the lane count is halved.

- The cross-lane gather is factored into a dedicated helper,
  shuffleDataAsLaneLayoutChange(): it bitcasts the source to i32, issues
  gpu.shuffle up to fetch the values from the dropped lanes, and concatenates
  the lane-local and gathered data with vector.shuffle. Only halving the lane
  count (factor of two), rank-2 vectors, and bit widths that are a multiple
  of 32 are supported; other cases fail the match.

- SgToLaneVectorExtractStridedSlice now adjusts the effective subgroup size
  when the source lane layout along the distributed dimension is smaller than
  the hardware subgroup size, so slice offsets/sizes are scaled correctly
  (e.g. a subgroup-space offset of 8 maps to a distributed offset of 1).

Add a unit test exercising the dpas_mx scale operand path.
2 files changed
tree: b6121bccb41f5dabd30362a28bd5506bf9fb06db
  1. .ci/
  2. .github/
  3. bolt/
  4. clang/
  5. clang-tools-extra/
  6. cmake/
  7. compiler-rt/
  8. cross-project-tests/
  9. flang/
  10. flang-rt/
  11. libc/
  12. libclc/
  13. libcxx/
  14. libcxxabi/
  15. libsycl/
  16. libunwind/
  17. lld/
  18. lldb/
  19. llvm/
  20. llvm-libgcc/
  21. mlir/
  22. offload/
  23. openmp/
  24. orc-rt/
  25. polly/
  26. runtimes/
  27. third-party/
  28. utils/
  29. .clang-format
  30. .clang-format-ignore
  31. .clang-tidy
  32. .git-blame-ignore-revs
  33. .gitattributes
  34. .gitignore
  35. .mailmap
  36. CODE_OF_CONDUCT.md
  37. CONTRIBUTING.md
  38. LICENSE.TXT
  39. pyproject.toml
  40. README.md
  41. SECURITY.md
README.md

The LLVM Compiler Infrastructure

OpenSSF Scorecard OpenSSF Best Practices libc++

Welcome to the LLVM project!

This repository contains the source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments.

The LLVM project has multiple components. The core of the project is itself called “LLVM”. This contains all of the tools, libraries, and header files needed to process intermediate representations and convert them into object files. Tools include an assembler, disassembler, bitcode analyzer, and bitcode optimizer.

C-like languages use the Clang frontend. This component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode -- and from there into object files, using LLVM.

Other components include: the libc++ C++ standard library, the LLD linker, and more.

Getting the Source Code and Building LLVM

Consult the Getting Started with LLVM page for information on building and running LLVM.

For information on how to contribute to the LLVM project, please take a look at the Contributing to LLVM guide.

Getting in touch

Join the LLVM Discourse forums, Discord chat, LLVM Office Hours or Regular sync-ups.

The LLVM project has adopted a code of conduct for participants to all modes of communication within the project.