314ef3e25602f04379c6317e071eb5d8ea62add2 - llvm

commit	314ef3e25602f04379c6317e071eb5d8ea62add2	[log] [tgz]
author	Sanjay Patel <spatel@rotateright.com>	Sun Feb 10 15:22:06 2019 +0000
committer	Sanjay Patel <spatel@rotateright.com>	Sun Feb 10 15:22:06 2019 +0000
tree	16dc84ec7bcea54350bac5d871d3b5aca8e3dc08
parent	5cb368b9cef3e6c04e1612e73c391ead8f7722f5 [diff]

[x86] narrow 256-bit horizontal ops via demanded elements

256-bit horizontal math ops are an x86 monstrosity (and thankfully have
not been extended to 512-bit AFAIK).

The two 128-bit halves operate on separate halves of the inputs. So if we
don't demand anything in the upper half of the result, we can extract the
low halves of the inputs, do the math, and then insert that result into a
256-bit output.

All of the extract/insert is free (ymm<-->xmm), so we're left with a
narrower (cheaper) version of the original op.

In the affected tests based on:
https://bugs.llvm.org/show_bug.cgi?id=33758
https://bugs.llvm.org/show_bug.cgi?id=38971
...we see that the h-op narrowing can result in further narrowing of other
math via existing generic transforms.

I originally drafted this patch as an exact pattern match starting from
extract_vector_elt, but I thought we might see diffs starting from
extract_subvector too, so I changed it to a more general demanded elements
solution. There are no extra existing regression test improvements from
that switch though, so we could go back.

Differential Revision: https://reviews.llvm.org/D57841

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353641 91177308-0d34-0410-b5e6-96231b3b80d8

3 files changed

tree: 16dc84ec7bcea54350bac5d871d3b5aca8e3dc08