[x86] vectorize more cast ops in lowering to avoid register file transfers

This is a follow-up to D56864.

If we're extracting from a non-zero index before casting to FP,
then shuffle the vector and optionally narrow the vector before doing the cast:

cast (extelt V, C) --> extelt (cast (extract_subv (shuffle V, [C...]))), 0

This might be enough to close PR39974:
https://bugs.llvm.org/show_bug.cgi?id=39974

Differential Revision: https://reviews.llvm.org/D58197

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354619 91177308-0d34-0410-b5e6-96231b3b80d8
3 files changed