Implement async_work_group_copy builtin v3

This is a simple implementation which just copies data synchronously.

v2:
  - Use size_t.

v3:
  - Fix possible race condition by splitting the copy among multiple
    work items.

git-svn-id: https://llvm.org/svn/llvm-project/libclc/trunk@219008 91177308-0d34-0410-b5e6-96231b3b80d8
6 files changed