)]}'
{
  "commit": "d5bf514200046fd104c63fee8a786a04f52a78c0",
  "tree": "1056f307ec95705c11cc679d06fcbd3a24f58513",
  "parents": [
    "9b4c99a1e4887cdb9c045e6ec06d7d832add07cb"
  ],
  "author": {
    "name": "modiking",
    "email": "mmo@nvidia.com",
    "time": "Mon Feb 23 19:44:05 2026 -0800"
  },
  "committer": {
    "name": "GitHub",
    "email": "noreply@github.com",
    "time": "Mon Feb 23 19:44:05 2026 -0800"
  },
  "message": "[NVPTX] Scalarize v2f32 instructions if input operand guarantees need for register coalescing (#180113)\n\nThe support of f32 packed instructions in #126337 revealed performance\nregressions on certain kernels. In one case, the cause comes from\nloading a v4f32 from shared memory but then accessing them as {r0, r2}\nand {r1, r3} from the full load of {r0, r1, r2, r3}.\n\nThis access pattern guarantees the registers requires a coalescing\noperation which increases register pressure and degrades performance.\nThe fix here is to identify if we can prove that an v2f32 operand comes\nfrom non-contiguous vector extracts and if so scalarizes the operation\nso the coalescing operation is no longer needed.\n\nI\u0027ve found that ptxas can see through the extra unpacks/repacks of\ncontiguous registers this causes in MIR. However in the full test case\nthe packing of the final scalar-\u003evector results does generate additional\ncosts especially since the only users unpack them. An additional MIR\npass is possible to catch the case\n\nAssisted-by: Cursor / claude-4.6-opus-high\n\n---------\n\nCo-authored-by: Princeton Ferro \u003cprincetonferro@gmail.com\u003e",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "8f1b70533a869887319adfc56d496cd24a0e1ac0",
      "old_mode": 33188,
      "old_path": "llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp",
      "new_id": "f5554be155eac7bc43e7e2d339abfd38c24572c4",
      "new_mode": 33188,
      "new_path": "llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp"
    },
    {
      "type": "add",
      "old_id": "0000000000000000000000000000000000000000",
      "old_mode": 0,
      "old_path": "/dev/null",
      "new_id": "f953b865a004ee265f72e9dfe97f6857c86b8ec0",
      "new_mode": 33188,
      "new_path": "llvm/test/CodeGen/NVPTX/scalarize-non-coalescable-v2f32.ll"
    }
  ]
}
