docs/ReleaseNotes.rst - llvm - Git at Google

 ======================
 LLVM 3.7 Release Notes
 ======================

 .. contents::
     :local:

 Introduction
 ============

 This document contains the release notes for the LLVM Compiler Infrastructure,
 release 3.7.  Here we describe the status of LLVM, including major improvements
 from the previous release, improvements in various subprojects of LLVM, and
 some of the current users of the code.  All LLVM releases may be downloaded
 from the `LLVM releases web site <http://llvm.org/releases/>`_.

 For more information about LLVM, including information about the latest
 release, please check out the `main LLVM web site <http://llvm.org/>`_.  If you
 have questions or comments, the `LLVM Developer's Mailing List
 <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ is a good place to send
 them.

 Note that if you are reading this file from a Subversion checkout or the main
 LLVM web page, this document applies to the *next* release, not the current
 one.  To see the release notes for a specific release, please see the `releases
 page <http://llvm.org/releases/>`_.

 Non-comprehensive list of changes in this release
 =================================================

 .. NOTE
    For small 1-3 sentence descriptions, just add an entry at the end of
    this list. If your description won't fit comfortably in one bullet
    point (e.g. maybe you would like to give an example of the
    functionality, or simply have a lot to talk about), see the `NOTE` below
    for adding a new subsection.

 * The minimum required Visual Studio version for building LLVM is now 2013
   Update 4.

 * A new documentation page, :doc:`Frontend/PerformanceTips`, contains a
   collection of tips for frontend authors on how to generate IR which LLVM is
   able to effectively optimize.

 * The ``DataLayout`` is no longer optional. All the IR level optimizations expects
   it to be present and the API has been changed to use a reference instead of
   a pointer to make it explicit. The Module owns the datalayout and it has to
   match the one attached to the TargetMachine for generating code.

   In 3.6, a pass was inserted in the pipeline to make the ``DataLayout`` accessible:
     ``MyPassManager->add(new DataLayoutPass(MyTargetMachine->getDataLayout()));``
   In 3.7, you don't need a pass, you set the ``DataLayout`` on the ``Module``:
     ``MyModule->setDataLayout(MyTargetMachine->createDataLayout());``

   The LLVM C API ``LLVMGetTargetMachineData`` is deprecated to reflect the fact
   that it won't be available anymore from ``TargetMachine`` in 3.8.

 * Comdats are now orthogonal to the linkage. LLVM will not create
   comdats for weak linkage globals and the frontends are responsible
   for explicitly adding them.

 * On ELF we now support multiple sections with the same name and
   comdat. This allows for smaller object files since multiple
   sections can have a simple name (`.text`, `.rodata`, etc).

 * LLVM now lazily loads metadata in some cases. Creating archives
   with IR files with debug info is now 25X faster.

 * llvm-ar can create archives in the BSD format used by OS X.

 * LLVM received a backend for the extended Berkely Packet Filter
   instruction set that can be dynamically loaded into the Linux kernel via the
   `bpf(2) <http://man7.org/linux/man-pages/man2/bpf.2.html>`_ syscall.

   Support for BPF has been present in the kernel for some time, but starting
   from 3.18 has been extended with such features as: 64-bit registers, 8
   additional registers registers, conditional backwards jumps, call
   instruction, shift instructions, map (hash table, array, etc.), 1-8 byte
   load/store from stack, and more.

   Up until now, users of BPF had to write bytecode by hand, or use
   custom generators. This release adds a proper LLVM backend target for the BPF
   bytecode architecture.

   The BPF target is now available by default, and options exist in both Clang
   (-target bpf) or llc (-march=bpf) to pick eBPF as a backend.

 * Switch-case lowering was rewritten to avoid generating unbalanced search trees
   (`PR22262 <http://llvm.org/pr22262>`_) and to exploit profile information
   when available. Some lowering strategies are now disabled when optimizations
   are turned off, to save compile time.

 * The debug info IR class hierarchy now inherits from ``Metadata`` and has its
   own bitcode records and assembly syntax
   (`documented in LangRef <LangRef.html#specialized-metadata-nodes>`_).  The debug
   info verifier has been merged with the main verifier.

 * LLVM IR and APIs are in a period of transition to aid in the removal of
   pointer types (the end goal being that pointers are typeless/opaque - void*,
   if you will). Some APIs and IR constructs have been modified to take
   explicit types that are currently checked to match the target type of their
   pre-existing pointer type operands. Further changes are still needed, but the
   more you can avoid using ``PointerType::getPointeeType``, the easier the
   migration will be.

 * Argument-less ``TargetMachine::getSubtarget`` and
   ``TargetMachine::getSubtargetImpl`` have been removed from the tree. Updating
   out of tree ports is as simple as implementing a non-virtual version in the
   target, but implementing full ``Function`` based ``TargetSubtargetInfo``
   support is recommended.

 * This is expected to be the last major release of LLVM that supports being
   run on Windows XP and Windows Vista.  For the next major release the minimum
   Windows version requirement will be Windows 7.

 Changes to the MIPS Target
 --------------------------

 During this release the MIPS target has:

 * Added support for MIPS32R3, MIPS32R5, MIPS32R3, MIPS32R5, and microMIPS32.

 * Added support for dynamic stack realignment. This is of particular importance
   to MSA on 32-bit subtargets since vectors always exceed the stack alignment on
   the O32 ABI.

 * Added support for compiler-rt including:

   * Support for the Address, and Undefined Behaviour Sanitizers for all MIPS
     subtargets.

   * Support for the Data Flow, and Memory Sanitizer for 64-bit subtargets.

   * Support for the Profiler for all MIPS subtargets.

 * Added support for libcxx, and libcxxabi.

 * Improved inline assembly support such that memory constraints may now make use
   of the appropriate address offsets available to the instructions. Also, added
   support for the ``ZC`` constraint.

 * Added support for 128-bit integers on 64-bit subtargets and 16-bit floating
   point conversions on all subtargets.

 * Added support for read-only ``.eh_frame`` sections by storing type information
   indirectly.

 * Added support for MCJIT on all 64-bit subtargets as well as MIPS32R6.

 * Added support for fast instruction selection on MIPS32 and MIPS32R2 with PIC.

 * Various bug fixes. Including the following notable fixes:

   * Fixed 'jumpy' debug line info around calls where calculation of the address
     of the function would inappropriately change the line number.

   * Fixed missing ``__mips_isa_rev`` macro on the MIPS32R6 and MIPS32R6
     subtargets.

   * Fixed representation of NaN when targeting systems using traditional
     encodings. Traditionally, MIPS has used NaN encodings that were compatible
     with IEEE754-1985 but would later be found incompatible with IEEE754-2008.

   * Fixed multiple segfaults and assertions in the disassembler when
     disassembling instructions that have memory operands.

   * Fixed multiple cases of suboptimal code generation involving $zero.

   * Fixed code generation of 128-bit shifts on 64-bit subtargets.

   * Prevented the delay slot filler from filling call delay slots with
     instructions that modify or use $ra.

   * Fixed some remaining N32/N64 calling convention bugs when using small
     structures on big-endian subtargets.

   * Fixed missing sign-extensions that are required by the N32/N64 calling
     convention when generating calls to library functions with 32-bit
     parameters.

   * Corrected the ``int64_t`` typedef to be ``long`` for N64.

   * ``-mno-odd-spreg`` is now honoured for vector insertion/extraction
     operations when using -mmsa.

   * Fixed vector insertion and extraction for MSA on 64-bit subtargets.

   * Corrected the representation of member function pointers. This makes them
     usable on microMIPS subtargets.

 Changes to the PowerPC Target
 -----------------------------

 There are numerous improvements to the PowerPC target in this release:

 * LLVM now supports the ISA 2.07B (POWER8) instruction set, including
   direct moves between general registers and vector registers, and
   built-in support for hardware transactional memory (HTM).  Some missing
   instructions from ISA 2.06 (POWER7) were also added.

 * Code generation for the local-dynamic and global-dynamic thread-local
   storage models has been improved.

 * Loops may be restructured to leverage pre-increment loads and stores.

 * QPX - The vector instruction set used by the IBM Blue Gene/Q supercomputers
   is now supported.

 * Loads from the TOC area are now correctly treated as invariant.

 * PowerPC now has support for i128 and v1i128 types.  The types differ
   in how they are passed in registers for the ELFv2 ABI.

 * Disassembly will now print shorter mnemonic aliases when available.

 * Optional register name prefixes for VSX and QPX registers are now
   supported in the assembly parser.

 * The back end now contains a pass to remove unnecessary vector swaps
   from POWER8 little-endian code generation.  Additional improvements
   are planned for release 3.8.

 * The undefined-behavior sanitizer (UBSan) is now supported for PowerPC.

 * Many new vector programming APIs have been added to altivec.h.
   Additional ones are planned for release 3.8.

 * PowerPC now supports __builtin_call_with_static_chain.

 * PowerPC now supports the revised -mrecip option that permits finer
   control over reciprocal estimates.

 * Many bugs have been identified and fixed.

 Changes to the SystemZ Target
 -----------------------------

 * LLVM no longer attempts to automatically detect the current host CPU when
   invoked natively.

 * Support for all thread-local storage models. (Previous releases would support
   only the local-exec TLS model.)

 * The POPCNT instruction is now used on z196 and above.

 * The RISBGN instruction is now used on zEC12 and above.

 * Support for the transactional-execution facility on zEC12 and above.

 * Support for the z13 processor and its vector facility.


 Changes to the JIT APIs
 -----------------------

 * Added a new C++ JIT API called On Request Compilation, or ORC.

   ORC is a new JIT API inspired by MCJIT but designed to be more testable, and
   easier to extend with new features. A key new feature already in tree is lazy,
   function-at-a-time compilation for X86. Also included is a reimplementation of
   MCJIT's API and behavior (OrcMCJITReplacement). MCJIT itself remains in tree,
   and continues to be the default JIT ExecutionEngine, though new users are
   encouraged to try ORC out for their projects. (A good place to start is the
   new ORC tutorials under llvm/examples/kaleidoscope/orc).

 Sub-project Status Update
 =========================

 In addition to the core LLVM 3.7 distribution of production-quality compiler
 infrastructure, the LLVM project includes sub-projects that use the LLVM core
 and share the same distribution license. This section provides updates on these
 sub-projects.

 Polly - The Polyhedral Loop Optimizer in LLVM
 ---------------------------------------------

 `Polly <http://polly.llvm.org>`_ is a polyhedral loop optimization
 infrastructure that provides data-locality optimizations to LLVM-based
 compilers. When compiled as part of clang or loaded as a module into clang,
 it can perform loop optimizations such as tiling, loop fusion or outer-loop
 vectorization. As a generic loop optimization infrastructure it allows
 developers to get a per-loop-iteration model of a loop nest on which detailed
 analysis and transformations can be performed.

 Changes since the last release:

 * isl imported into Polly distribution

   `isl <http://repo.or.cz/w/isl.git>`_, the math library Polly uses, has been
   imported into the source code repository of Polly and is now distributed as part
   of Polly. As this was the last external library dependency of Polly, Polly can
   now be compiled right after checking out the Polly source code without the need
   for any additional libraries to be pre-installed.

 * Small integer optimization of isl

   The MIT licensed imath backend using in `isl <http://repo.or.cz/w/isl.git>`_ for
   arbitrary width integer computations has been optimized to use native integer
   operations for the common case where the operands of a computation fit into 32
   bit and to only fall back to large arbitrary precision integers for the
   remaining cases. This optimization has greatly improved the compile-time
   performance of Polly, both due to faster native operations also due to a
   reduction in malloc traffic and pointer indirections. As a result, computations
   that use arbitrary precision integers heavily have been speed up by almost 6x.
   As a result, the compile-time of Polly on the Polybench test kernels in the LNT
   suite has been reduced by 20% on average with compile time reductions between
   9-43%.

 * Schedule Trees

   Polly now uses internally so-called > Schedule Trees < to model the loop
   structure it optimizes. Schedule trees are an easy to understand tree structure
   that describes a loop nest using integer constraint sets to keep track of
   execution constraints. It allows the developer to use per-tree-node operations
   to modify the loop tree. Programatic analysis that work on the schedule tree
   (e.g., as dependence analysis) also show a visible speedup as they can exploit
   the tree structure of the schedule and need to fall back to ILP based
   optimization problems less often. Section 6 of `Polyhedral AST generation is
   more than scanning polyhedra
   <http://www.grosser.es/#pub-polyhedral-AST-generation>`_ gives a detailed
   explanation of this schedule trees.

 * Scalar and PHI node modeling - Polly as an analysis

   Polly now requires almost no preprocessing to analyse LLVM-IR, which makes it
   easier to use Polly as a pure analysis pass e.g. to provide more precise
   dependence information to non-polyhedral transformation passes. Originally,
   Polly required the input LLVM-IR to be preprocessed such that all scalar and
   PHI-node dependences are translated to in-memory operations. Since this release,
   Polly has full support for scalar and PHI node dependences and requires no
   scalar-to-memory translation for such kind of dependences.

 * Modeling of modulo and non-affine conditions

   Polly can now supports modulo operations such as A[t%2][i][j] as they appear
   often in stencil computations and also allows data-dependent conditional
   branches as they result e.g. from ternary conditions ala A[i] > 255 ? 255 :
   A[i].

 * Delinearization

   Polly now support the analysis of manually linearized multi-dimensional arrays
   as they result form macros such as
   "#define 2DARRAY(A,i,j) (A.data[(i) * A.size + (j)]". Similar constructs appear
   in old C code written before C99, C++ code such as boost::ublas, LLVM exported
   from Julia, Matlab generated code and many others. Our work titled
   `Optimistic Delinearization of Parametrically Sized Arrays
   <http://www.grosser.es/#pub-optimistic-delinerization>`_ gives details.

 * Compile time improvements

   Pratik Bahtu worked on compile-time performance tuning of Polly. His work
   together with the support for schedule trees and the small integer optimization
   in isl notably reduced the compile time.

 * Increased compute timeouts

   As Polly's compile time has been notabily improved, we were able to increase
   the compile time saveguards in Polly. As a result, the default configuration
   of Polly can now analyze larger loop nests without running into compile time
   restrictions.

 * Export Debug Locations via JSCoP file

   Polly's JSCoP import/export format gained support for debug locations that show
   to the user the source code location of detected scops.

 * Improved windows support

   The compilation of Polly on windows using cmake has been improved and several
   visual studio build issues have been addressed.

 * Many bug fixes

 libunwind
 ---------

 The unwind implementation which use to reside in `libc++abi` has been moved into
 a separate repository.  This implementation can still be used for `libc++abi` by
 specifying `-DLIBCXXABI_USE_LLVM_UNWINDER=YES` and
 `-DLIBCXXABI_LIBUNWIND_PATH=<path to libunwind source>` when configuring
 `libc++abi`, which defaults to `true` when building on ARM.

 The new repository can also be built standalone if just `libunwind` is desired.

 External Open Source Projects Using LLVM 3.7
 ============================================

 An exciting aspect of LLVM is that it is used as an enabling technology for
 a lot of other language and tools projects. This section lists some of the
 projects that have already been updated to work with LLVM 3.7.


 LDC - the LLVM-based D compiler
 -------------------------------

 `D <http://dlang.org>`_ is a language with C-like syntax and static typing. It
 pragmatically combines efficiency, control, and modeling power, with safety and
 programmer productivity. D supports powerful concepts like Compile-Time Function
 Execution (CTFE) and Template Meta-Programming, provides an innovative approach
 to concurrency and offers many classical paradigms.

 `LDC <http://wiki.dlang.org/LDC>`_ uses the frontend from the reference compiler
 combined with LLVM as backend to produce efficient native code. LDC targets
 x86/x86_64 systems like Linux, OS X, FreeBSD and Windows and also Linux on
 PowerPC (32/64 bit). Ports to other architectures like ARM, AArch64 and MIPS64
 are underway.

 Portable Computing Language (pocl)
 ----------------------------------

 In addition to producing an easily portable open source OpenCL
 implementation, another major goal of `pocl <http://portablecl.org/>`_
 is improving performance portability of OpenCL programs with
 compiler optimizations, reducing the need for target-dependent manual
 optimizations. An important part of pocl is a set of LLVM passes used to
 statically parallelize multiple work-items with the kernel compiler, even in
 the presence of work-group barriers.


 TTA-based Co-design Environment (TCE)
 -------------------------------------

 `TCE <http://tce.cs.tut.fi/>`_ is a toolset for designing customized
 exposed datapath processors based on the Transport triggered
 architecture (TTA).

 The toolset provides a complete co-design flow from C/C++
 programs down to synthesizable VHDL/Verilog and parallel program binaries.
 Processor customization points include the register files, function units,
 supported operations, and the interconnection network.

 TCE uses Clang and LLVM for C/C++/OpenCL C language support, target independent
 optimizations and also for parts of code generation. It generates
 new LLVM-based code generators "on the fly" for the designed processors and
 loads them in to the compiler backend as runtime libraries to avoid
 per-target recompilation of larger parts of the compiler chain.

 BPF Compiler Collection (BCC)
 -----------------------------
 `BCC <https://github.com/iovisor/bcc>`_ is a Python + C framework for tracing and
 networking that is using Clang rewriter + 2nd pass of Clang + BPF backend to
 generate eBPF and push it into the kernel.

 LLVMSharp & ClangSharp
 ----------------------

 `LLVMSharp <http://www.llvmsharp.org>`_ and
 `ClangSharp <http://www.clangsharp.org>`_ are type-safe C# bindings for
 Microsoft.NET and Mono that Platform Invoke into the native libraries.
 ClangSharp is self-hosted and is used to generated LLVMSharp using the
 LLVM-C API.

 `LLVMSharp Kaleidoscope Tutorials <http://www.llvmsharp.org/Kaleidoscope/>`_
 are instructive examples of writing a compiler in C#, with certain improvements
 like using the visitor pattern to generate LLVM IR.

 `ClangSharp PInvoke Generator <http://www.clangsharp.org/PInvoke/>`_ is the
 self-hosting mechanism for LLVM/ClangSharp and is demonstrative of using
 LibClang to generate Platform Invoke (PInvoke) signatures for C APIs.


 Additional Information
 ======================

 A wide variety of additional information is available on the `LLVM web page
 <http://llvm.org/>`_, in particular in the `documentation
 <http://llvm.org/docs/>`_ section.  The web page also contains versions of the
 API documentation which is up-to-date with the Subversion version of the source
 code.  You can access versions of these documents specific to this release by
 going into the ``llvm/docs/`` directory in the LLVM tree.

 If you have any questions or comments about LLVM, please feel free to contact
 us via the `mailing lists <http://llvm.org/docs/#maillist>`_.
	======================
	LLVM 3.7 Release Notes
	======================

	.. contents::
	:local:

	Introduction
	============

	This document contains the release notes for the LLVM Compiler Infrastructure,
	release 3.7. Here we describe the status of LLVM, including major improvements
	from the previous release, improvements in various subprojects of LLVM, and
	some of the current users of the code. All LLVM releases may be downloaded
	from the `LLVM releases web site <http://llvm.org/releases/>`_.

	For more information about LLVM, including information about the latest
	release, please check out the `main LLVM web site <http://llvm.org/>`_. If you
	have questions or comments, the `LLVM Developer's Mailing List
	<http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ is a good place to send
	them.

	Note that if you are reading this file from a Subversion checkout or the main
	LLVM web page, this document applies to the next release, not the current
	one. To see the release notes for a specific release, please see the `releases
	page <http://llvm.org/releases/>`_.

	Non-comprehensive list of changes in this release
	=================================================

	.. NOTE
	For small 1-3 sentence descriptions, just add an entry at the end of
	this list. If your description won't fit comfortably in one bullet
	point (e.g. maybe you would like to give an example of the
	functionality, or simply have a lot to talk about), see the `NOTE` below
	for adding a new subsection.

	* The minimum required Visual Studio version for building LLVM is now 2013
	Update 4.

	* A new documentation page, :doc:`Frontend/PerformanceTips`, contains a
	collection of tips for frontend authors on how to generate IR which LLVM is
	able to effectively optimize.

	* The ``DataLayout`` is no longer optional. All the IR level optimizations expects
	it to be present and the API has been changed to use a reference instead of
	a pointer to make it explicit. The Module owns the datalayout and it has to
	match the one attached to the TargetMachine for generating code.

	In 3.6, a pass was inserted in the pipeline to make the ``DataLayout`` accessible:
	``MyPassManager->add(new DataLayoutPass(MyTargetMachine->getDataLayout()));``
	In 3.7, you don't need a pass, you set the ``DataLayout`` on the ``Module``:
	``MyModule->setDataLayout(MyTargetMachine->createDataLayout());``

	The LLVM C API ``LLVMGetTargetMachineData`` is deprecated to reflect the fact
	that it won't be available anymore from ``TargetMachine`` in 3.8.

	* Comdats are now orthogonal to the linkage. LLVM will not create
	comdats for weak linkage globals and the frontends are responsible
	for explicitly adding them.

	* On ELF we now support multiple sections with the same name and
	comdat. This allows for smaller object files since multiple
	sections can have a simple name (`.text`, `.rodata`, etc).

	* LLVM now lazily loads metadata in some cases. Creating archives
	with IR files with debug info is now 25X faster.

	* llvm-ar can create archives in the BSD format used by OS X.

	* LLVM received a backend for the extended Berkely Packet Filter
	instruction set that can be dynamically loaded into the Linux kernel via the
	`bpf(2) <http://man7.org/linux/man-pages/man2/bpf.2.html>`_ syscall.

	Support for BPF has been present in the kernel for some time, but starting
	from 3.18 has been extended with such features as: 64-bit registers, 8
	additional registers registers, conditional backwards jumps, call
	instruction, shift instructions, map (hash table, array, etc.), 1-8 byte
	load/store from stack, and more.

	Up until now, users of BPF had to write bytecode by hand, or use
	custom generators. This release adds a proper LLVM backend target for the BPF
	bytecode architecture.

	The BPF target is now available by default, and options exist in both Clang
	(-target bpf) or llc (-march=bpf) to pick eBPF as a backend.

	* Switch-case lowering was rewritten to avoid generating unbalanced search trees
	(`PR22262 <http://llvm.org/pr22262>`_) and to exploit profile information
	when available. Some lowering strategies are now disabled when optimizations
	are turned off, to save compile time.

	* The debug info IR class hierarchy now inherits from ``Metadata`` and has its
	own bitcode records and assembly syntax
	(`documented in LangRef <LangRef.html#specialized-metadata-nodes>`_). The debug
	info verifier has been merged with the main verifier.

	* LLVM IR and APIs are in a period of transition to aid in the removal of
	pointer types (the end goal being that pointers are typeless/opaque - void*,
	if you will). Some APIs and IR constructs have been modified to take
	explicit types that are currently checked to match the target type of their
	pre-existing pointer type operands. Further changes are still needed, but the
	more you can avoid using ``PointerType::getPointeeType``, the easier the
	migration will be.

	* Argument-less ``TargetMachine::getSubtarget`` and
	``TargetMachine::getSubtargetImpl`` have been removed from the tree. Updating
	out of tree ports is as simple as implementing a non-virtual version in the
	target, but implementing full ``Function`` based ``TargetSubtargetInfo``
	support is recommended.

	* This is expected to be the last major release of LLVM that supports being
	run on Windows XP and Windows Vista. For the next major release the minimum
	Windows version requirement will be Windows 7.

	Changes to the MIPS Target
	--------------------------

	During this release the MIPS target has:

	* Added support for MIPS32R3, MIPS32R5, MIPS32R3, MIPS32R5, and microMIPS32.

	* Added support for dynamic stack realignment. This is of particular importance
	to MSA on 32-bit subtargets since vectors always exceed the stack alignment on
	the O32 ABI.

	* Added support for compiler-rt including:

	* Support for the Address, and Undefined Behaviour Sanitizers for all MIPS
	subtargets.

	* Support for the Data Flow, and Memory Sanitizer for 64-bit subtargets.

	* Support for the Profiler for all MIPS subtargets.

	* Added support for libcxx, and libcxxabi.

	* Improved inline assembly support such that memory constraints may now make use
	of the appropriate address offsets available to the instructions. Also, added
	support for the ``ZC`` constraint.

	* Added support for 128-bit integers on 64-bit subtargets and 16-bit floating
	point conversions on all subtargets.

	* Added support for read-only ``.eh_frame`` sections by storing type information
	indirectly.

	* Added support for MCJIT on all 64-bit subtargets as well as MIPS32R6.

	* Added support for fast instruction selection on MIPS32 and MIPS32R2 with PIC.

	* Various bug fixes. Including the following notable fixes:

	* Fixed 'jumpy' debug line info around calls where calculation of the address
	of the function would inappropriately change the line number.

	* Fixed missing ``__mips_isa_rev`` macro on the MIPS32R6 and MIPS32R6
	subtargets.

	* Fixed representation of NaN when targeting systems using traditional
	encodings. Traditionally, MIPS has used NaN encodings that were compatible
	with IEEE754-1985 but would later be found incompatible with IEEE754-2008.

	* Fixed multiple segfaults and assertions in the disassembler when
	disassembling instructions that have memory operands.

	* Fixed multiple cases of suboptimal code generation involving $zero.

	* Fixed code generation of 128-bit shifts on 64-bit subtargets.

	* Prevented the delay slot filler from filling call delay slots with
	instructions that modify or use $ra.

	* Fixed some remaining N32/N64 calling convention bugs when using small
	structures on big-endian subtargets.

	* Fixed missing sign-extensions that are required by the N32/N64 calling
	convention when generating calls to library functions with 32-bit
	parameters.

	* Corrected the ``int64_t`` typedef to be ``long`` for N64.

	* ``-mno-odd-spreg`` is now honoured for vector insertion/extraction
	operations when using -mmsa.

	* Fixed vector insertion and extraction for MSA on 64-bit subtargets.

	* Corrected the representation of member function pointers. This makes them
	usable on microMIPS subtargets.

	Changes to the PowerPC Target
	-----------------------------

	There are numerous improvements to the PowerPC target in this release:

	* LLVM now supports the ISA 2.07B (POWER8) instruction set, including
	direct moves between general registers and vector registers, and
	built-in support for hardware transactional memory (HTM). Some missing
	instructions from ISA 2.06 (POWER7) were also added.

	* Code generation for the local-dynamic and global-dynamic thread-local
	storage models has been improved.

	* Loops may be restructured to leverage pre-increment loads and stores.

	* QPX - The vector instruction set used by the IBM Blue Gene/Q supercomputers
	is now supported.

	* Loads from the TOC area are now correctly treated as invariant.

	* PowerPC now has support for i128 and v1i128 types. The types differ
	in how they are passed in registers for the ELFv2 ABI.

	* Disassembly will now print shorter mnemonic aliases when available.

	* Optional register name prefixes for VSX and QPX registers are now
	supported in the assembly parser.

	* The back end now contains a pass to remove unnecessary vector swaps
	from POWER8 little-endian code generation. Additional improvements
	are planned for release 3.8.

	* The undefined-behavior sanitizer (UBSan) is now supported for PowerPC.

	* Many new vector programming APIs have been added to altivec.h.
	Additional ones are planned for release 3.8.

	* PowerPC now supports __builtin_call_with_static_chain.

	* PowerPC now supports the revised -mrecip option that permits finer
	control over reciprocal estimates.

	* Many bugs have been identified and fixed.

	Changes to the SystemZ Target
	-----------------------------

	* LLVM no longer attempts to automatically detect the current host CPU when
	invoked natively.

	* Support for all thread-local storage models. (Previous releases would support
	only the local-exec TLS model.)

	* The POPCNT instruction is now used on z196 and above.

	* The RISBGN instruction is now used on zEC12 and above.

	* Support for the transactional-execution facility on zEC12 and above.

	* Support for the z13 processor and its vector facility.


	Changes to the JIT APIs
	-----------------------

	* Added a new C++ JIT API called On Request Compilation, or ORC.

	ORC is a new JIT API inspired by MCJIT but designed to be more testable, and
	easier to extend with new features. A key new feature already in tree is lazy,
	function-at-a-time compilation for X86. Also included is a reimplementation of
	MCJIT's API and behavior (OrcMCJITReplacement). MCJIT itself remains in tree,
	and continues to be the default JIT ExecutionEngine, though new users are
	encouraged to try ORC out for their projects. (A good place to start is the
	new ORC tutorials under llvm/examples/kaleidoscope/orc).

	Sub-project Status Update
	=========================

	In addition to the core LLVM 3.7 distribution of production-quality compiler
	infrastructure, the LLVM project includes sub-projects that use the LLVM core
	and share the same distribution license. This section provides updates on these
	sub-projects.

	Polly - The Polyhedral Loop Optimizer in LLVM
	---------------------------------------------

	`Polly <http://polly.llvm.org>`_ is a polyhedral loop optimization
	infrastructure that provides data-locality optimizations to LLVM-based
	compilers. When compiled as part of clang or loaded as a module into clang,
	it can perform loop optimizations such as tiling, loop fusion or outer-loop
	vectorization. As a generic loop optimization infrastructure it allows
	developers to get a per-loop-iteration model of a loop nest on which detailed
	analysis and transformations can be performed.

	Changes since the last release:

	* isl imported into Polly distribution

	`isl <http://repo.or.cz/w/isl.git>`_, the math library Polly uses, has been
	imported into the source code repository of Polly and is now distributed as part
	of Polly. As this was the last external library dependency of Polly, Polly can
	now be compiled right after checking out the Polly source code without the need
	for any additional libraries to be pre-installed.

	* Small integer optimization of isl

	The MIT licensed imath backend using in `isl <http://repo.or.cz/w/isl.git>`_ for
	arbitrary width integer computations has been optimized to use native integer
	operations for the common case where the operands of a computation fit into 32
	bit and to only fall back to large arbitrary precision integers for the
	remaining cases. This optimization has greatly improved the compile-time
	performance of Polly, both due to faster native operations also due to a
	reduction in malloc traffic and pointer indirections. As a result, computations
	that use arbitrary precision integers heavily have been speed up by almost 6x.
	As a result, the compile-time of Polly on the Polybench test kernels in the LNT
	suite has been reduced by 20% on average with compile time reductions between
	9-43%.

	* Schedule Trees

	Polly now uses internally so-called > Schedule Trees < to model the loop
	structure it optimizes. Schedule trees are an easy to understand tree structure
	that describes a loop nest using integer constraint sets to keep track of
	execution constraints. It allows the developer to use per-tree-node operations
	to modify the loop tree. Programatic analysis that work on the schedule tree
	(e.g., as dependence analysis) also show a visible speedup as they can exploit
	the tree structure of the schedule and need to fall back to ILP based
	optimization problems less often. Section 6 of `Polyhedral AST generation is
	more than scanning polyhedra
	<http://www.grosser.es/#pub-polyhedral-AST-generation>`_ gives a detailed
	explanation of this schedule trees.

	* Scalar and PHI node modeling - Polly as an analysis

	Polly now requires almost no preprocessing to analyse LLVM-IR, which makes it
	easier to use Polly as a pure analysis pass e.g. to provide more precise
	dependence information to non-polyhedral transformation passes. Originally,
	Polly required the input LLVM-IR to be preprocessed such that all scalar and
	PHI-node dependences are translated to in-memory operations. Since this release,
	Polly has full support for scalar and PHI node dependences and requires no
	scalar-to-memory translation for such kind of dependences.

	* Modeling of modulo and non-affine conditions

	Polly can now supports modulo operations such as A[t%2][i][j] as they appear
	often in stencil computations and also allows data-dependent conditional
	branches as they result e.g. from ternary conditions ala A[i] > 255 ? 255 :
	A[i].

	* Delinearization

	Polly now support the analysis of manually linearized multi-dimensional arrays
	as they result form macros such as
	"#define 2DARRAY(A,i,j) (A.data[(i) * A.size + (j)]". Similar constructs appear
	in old C code written before C99, C++ code such as boost::ublas, LLVM exported
	from Julia, Matlab generated code and many others. Our work titled
	`Optimistic Delinearization of Parametrically Sized Arrays
	<http://www.grosser.es/#pub-optimistic-delinerization>`_ gives details.

	* Compile time improvements

	Pratik Bahtu worked on compile-time performance tuning of Polly. His work
	together with the support for schedule trees and the small integer optimization
	in isl notably reduced the compile time.

	* Increased compute timeouts

	As Polly's compile time has been notabily improved, we were able to increase
	the compile time saveguards in Polly. As a result, the default configuration
	of Polly can now analyze larger loop nests without running into compile time
	restrictions.

	* Export Debug Locations via JSCoP file

	Polly's JSCoP import/export format gained support for debug locations that show
	to the user the source code location of detected scops.

	* Improved windows support

	The compilation of Polly on windows using cmake has been improved and several
	visual studio build issues have been addressed.

	* Many bug fixes

	libunwind
	---------

	The unwind implementation which use to reside in `libc++abi` has been moved into
	a separate repository. This implementation can still be used for `libc++abi` by
	specifying `-DLIBCXXABI_USE_LLVM_UNWINDER=YES` and
	`-DLIBCXXABI_LIBUNWIND_PATH=<path to libunwind source>` when configuring
	`libc++abi`, which defaults to `true` when building on ARM.

	The new repository can also be built standalone if just `libunwind` is desired.

	External Open Source Projects Using LLVM 3.7
	============================================

	An exciting aspect of LLVM is that it is used as an enabling technology for
	a lot of other language and tools projects. This section lists some of the
	projects that have already been updated to work with LLVM 3.7.


	LDC - the LLVM-based D compiler
	-------------------------------

	`D <http://dlang.org>`_ is a language with C-like syntax and static typing. It
	pragmatically combines efficiency, control, and modeling power, with safety and
	programmer productivity. D supports powerful concepts like Compile-Time Function
	Execution (CTFE) and Template Meta-Programming, provides an innovative approach
	to concurrency and offers many classical paradigms.

	`LDC <http://wiki.dlang.org/LDC>`_ uses the frontend from the reference compiler
	combined with LLVM as backend to produce efficient native code. LDC targets
	x86/x86_64 systems like Linux, OS X, FreeBSD and Windows and also Linux on
	PowerPC (32/64 bit). Ports to other architectures like ARM, AArch64 and MIPS64
	are underway.

	Portable Computing Language (pocl)
	----------------------------------

	In addition to producing an easily portable open source OpenCL
	implementation, another major goal of `pocl <http://portablecl.org/>`_
	is improving performance portability of OpenCL programs with
	compiler optimizations, reducing the need for target-dependent manual
	optimizations. An important part of pocl is a set of LLVM passes used to
	statically parallelize multiple work-items with the kernel compiler, even in
	the presence of work-group barriers.


	TTA-based Co-design Environment (TCE)
	-------------------------------------

	`TCE <http://tce.cs.tut.fi/>`_ is a toolset for designing customized
	exposed datapath processors based on the Transport triggered
	architecture (TTA).

	The toolset provides a complete co-design flow from C/C++
	programs down to synthesizable VHDL/Verilog and parallel program binaries.
	Processor customization points include the register files, function units,
	supported operations, and the interconnection network.

	TCE uses Clang and LLVM for C/C++/OpenCL C language support, target independent
	optimizations and also for parts of code generation. It generates
	new LLVM-based code generators "on the fly" for the designed processors and
	loads them in to the compiler backend as runtime libraries to avoid
	per-target recompilation of larger parts of the compiler chain.

	BPF Compiler Collection (BCC)
	-----------------------------
	`BCC <https://github.com/iovisor/bcc>`_ is a Python + C framework for tracing and
	networking that is using Clang rewriter + 2nd pass of Clang + BPF backend to
	generate eBPF and push it into the kernel.

	LLVMSharp & ClangSharp
	----------------------

	`LLVMSharp <http://www.llvmsharp.org>`_ and
	`ClangSharp <http://www.clangsharp.org>`_ are type-safe C# bindings for
	Microsoft.NET and Mono that Platform Invoke into the native libraries.
	ClangSharp is self-hosted and is used to generated LLVMSharp using the
	LLVM-C API.

	`LLVMSharp Kaleidoscope Tutorials <http://www.llvmsharp.org/Kaleidoscope/>`_
	are instructive examples of writing a compiler in C#, with certain improvements
	like using the visitor pattern to generate LLVM IR.

	`ClangSharp PInvoke Generator <http://www.clangsharp.org/PInvoke/>`_ is the
	self-hosting mechanism for LLVM/ClangSharp and is demonstrative of using
	LibClang to generate Platform Invoke (PInvoke) signatures for C APIs.


	Additional Information
	======================

	A wide variety of additional information is available on the `LLVM web page
	<http://llvm.org/>`_, in particular in the `documentation
	<http://llvm.org/docs/>`_ section. The web page also contains versions of the
	API documentation which is up-to-date with the Subversion version of the source
	code. You can access versions of these documents specific to this release by
	going into the ``llvm/docs/`` directory in the LLVM tree.

	If you have any questions or comments about LLVM, please feel free to contact
	us via the `mailing lists <http://llvm.org/docs/#maillist>`_.