[AArch64] Use ADDP tree for v16i8 to i16 bitmask extraction (#199812)
Re-land of #192974, reverted in 868aefd.
The original PR was reverted because the new lowering produced an
EXTRACT_VECTOR_ELT with an illegal i16 result type, which tripped the
operation legalizer when called from combineBoolVectorAndTruncateStore
on a `<32 x i1>` store split into two `<16 x i1>` halves. Returning i32
(handled by the caller's existing getZExtOrTrunc) avoids this.
Regression test added: bitmask_v32i8_split in
vec-combine-compare-to-bitmask.ll.
Note: in alias_mask.ll's whilewr_8_split2, the four halfword bitmask
results are now stored as separate `str h` × 4 rather than packed into
a d-register via ZIP1+EXT before a single store. Functionally
equivalent, slightly fewer NEON arithmetic ops. Side effect of the i32
return type; the store-merging combine doesn't match the same shape.
GitOrigin-RevId: 7966cbbdd02b686dbee9134514ea113772bcfa62
9 files changed