[CodeGenPrepare] limit formation of overflow intrinsics (PR41129)

This is probably a bigger limitation than necessary, but since we don't have any evidence yet
that this transform led to real-world perf improvements rather than regressions, I'm making a
quick, blunt fix.

In the motivating x86 example from:
https://bugs.llvm.org/show_bug.cgi?id=41129
...and shown in the regression test, we want to avoid an extra instruction in the dominating
block because that could be costly.

The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version
is any better than the other yet.

Differential Revision: https://reviews.llvm.org/D59602

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356665 91177308-0d34-0410-b5e6-96231b3b80d8
4 files changed