[AArch64] Add FP16 broadcast and transpose costs
The FP16 broadcast and transpose can always use the same instructions as are
used for i16 vectors, with or without +fullfp16. This fills in some extra costs
to make sure we get them right.
Differential Revision: https://reviews.llvm.org/D146035
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 33c6555..3e0f419 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -3253,6 +3253,8 @@
{TTI::SK_Broadcast, MVT::v2i32, 1},
{TTI::SK_Broadcast, MVT::v4i32, 1},
{TTI::SK_Broadcast, MVT::v2i64, 1},
+ {TTI::SK_Broadcast, MVT::v4f16, 1},
+ {TTI::SK_Broadcast, MVT::v8f16, 1},
{TTI::SK_Broadcast, MVT::v2f32, 1},
{TTI::SK_Broadcast, MVT::v4f32, 1},
{TTI::SK_Broadcast, MVT::v2f64, 1},
@@ -3265,6 +3267,8 @@
{TTI::SK_Transpose, MVT::v2i32, 1},
{TTI::SK_Transpose, MVT::v4i32, 1},
{TTI::SK_Transpose, MVT::v2i64, 1},
+ {TTI::SK_Transpose, MVT::v4f16, 1},
+ {TTI::SK_Transpose, MVT::v8f16, 1},
{TTI::SK_Transpose, MVT::v2f32, 1},
{TTI::SK_Transpose, MVT::v4f32, 1},
{TTI::SK_Transpose, MVT::v2f64, 1},