[AArch64] Improve getPartialReductionCost for fixed-width VFs (#126538)
NEON does not have a version of udot/sdot that accumulates into
64-bit integer values, so we should return Invalid from
getPartialReductionCost for 64-bit types and fixed-width VFs.
In theory, if the 64-bit versions of SVE udot/sdot are available
we could use those, but we don't currently have lowering support
for that.
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 325056c..bd0d55f 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -4705,7 +4705,8 @@
if (VFMinValue == Scale)
return Invalid;
}
- if (VF.isFixed() && (!ST->isNeonAvailable() || !ST->hasDotProd()))
+ if (VF.isFixed() &&
+ (!ST->isNeonAvailable() || !ST->hasDotProd() || AccumEVT == MVT::i64))
return Invalid;
if (InputEVT == MVT::i8) {