[OpenMP][libomp][doc] Add environment variables documentation

Add documentation for the environment variables for libomp

Differential Revision: https://reviews.llvm.org/D114269
diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst
index aa990d5..86e74dd 100644
--- a/openmp/docs/design/Runtimes.rst
+++ b/openmp/docs/design/Runtimes.rst
@@ -12,6 +12,633 @@
 the LLVM/OpenMP host runtime, aka.  `libomp.so`, is available as a `pdf
 <https://openmp.llvm.org/Reference.pdf>`_.
 
+.. _libomp_environment_vars:
+
+Environment Variables
+^^^^^^^^^^^^^^^^^^^^^
+
+OMP_CANCELLATION
+""""""""""""""""
+
+Enables cancellation of the innermost enclosing region of the type specified.
+If set to ``true``, the effects of the cancel construct and of cancellation
+points are enabled and cancellation is activated. If set to ``false``,
+cancellation is disabled and the cancel construct and cancellation points are
+effectively ignored.
+
+.. note::
+   Internal barrier code will work differently depending on whether cancellation
+   is enabled. Barrier code should repeatedly check the global flag to figure
+   out if cancellation has been triggered. If a thread observes cancellation, it
+   should leave the barrier prematurely with the return value 1 (and may wake up
+   other threads). Otherwise, it should leave the barrier with the return value 0.
+
+Enables (``true``) or disables (``false``) cancellation of the innermost
+enclosing region of the type specified.
+
+**Default:** ``false``
+
+
+OMP_DISPLAY_ENV
+"""""""""""""""
+
+Enables (``true``) or disables (``false``) the printing to ``stderr`` of
+the OpenMP version number and the values associated with the OpenMP
+environment variables.
+
+Possible values are: ``true``, ``false``, or ``verbose``.
+
+**Default:** ``false``
+
+OMP_DEFAULT_DEVICE
+""""""""""""""""""
+
+Sets the device that will be used in a target region. The OpenMP routine
+``omp_set_default_device`` or a device clause in a parallel pragma can override
+this variable. If no device with the specified device number exists, the code is
+executed on the host. If this environment variable is not set, device number 0
+is used.
+
+OMP_DYNAMIC
+"""""""""""
+
+Enables (``true``) or disables (``false``) the dynamic adjustment of the
+number of threads.
+
+| **Default:** ``false``
+
+OMP_MAX_ACTIVE_LEVELS
+"""""""""""""""""""""
+
+The maximum number of levels of parallel nesting for the program.
+
+| **Default:** ``1``
+
+OMP_NESTED
+""""""""""
+
+.. warning::
+    Deprecated. Please use ``OMP_MAX_ACTIVE_LEVELS`` to control nested parallelism
+
+Enables (``true``) or disables (``false``) nested parallelism.
+
+| **Default:** ``false``
+
+OMP_NUM_THREADS
+"""""""""""""""
+
+Sets the maximum number of threads to use for OpenMP parallel regions if no
+other value is specified in the application.
+
+The value can be a single integer, in which case it specifies the number of threads
+for all parallel regions. The value can also be a comma-separated list of integers,
+in which case each integer specifies the number of threads for a parallel
+region at that particular nesting level.
+
+The first position in the list represents the outer-most parallel nesting level,
+the second position represents the next-inner parallel nesting level, and so on.
+At any level, the integer can be left out of the list. If the first integer in a
+list is left out, it implies the normal default value for threads is used at the
+outer-most level. If the integer is left out of any other level, the number of
+threads for that level is inherited from the previous level.
+
+| **Default:** The number of processors visible to the operating system on which the program is executed.
+| **Syntax:** ``OMP_NUM_THREADS=value[,value]*``
+| **Example:** ``OMP_NUM_THREADS=4,3``
+
+OMP_PLACES
+""""""""""
+
+Specifies an explicit ordered list of places, either as an abstract name
+describing a set of places or as an explicit list of places described by
+non-negative numbers. An exclusion operator, ``!``, can also be used to exclude
+the number or place immediately following the operator.
+
+For **explicit lists**, an ordered list of places is specified with each place
+represented as a set of non-negative numbers. The non-negative numbers represent
+operating system logical processor numbers and can be thought of as an OS affinity mask.
+
+Individual places can be specified through two methods.
+Both the **examples** below represent the same place.
+
+* An explicit list of comma-separated non-negatives numbers **Example:** ``{0,2,4,6}``
+* An interval with notation ``<lower-bound>:<length>[:<stride>]``.  **Example:** ``{0:4:2}``. When ``<stride>`` is omitted, a unit stride is assumed.
+  The interval notation represents this set of numbers:
+
+::
+
+    <lower-bound>, <lower-bound> + <stride>, ..., <lower-bound> + (<length> - 1) * <stride>
+
+
+A place list can also be specified using the same interval
+notation: ``{place}:<length>[:<stride>]``.
+This represents the list of length ``<length>`` places determined by the following:
+
+.. code-block:: c
+
+    {place}, {place} + <stride>, ..., {place} + (<length>-1)*<stride>
+    Where given {place} and integer N, {place} + N = {place with every number offset by N}
+    Example: {0,3,6}:4:1 represents {0,3,6}, {1,4,7}, {2,5,8}, {3,6,9}
+
+**Examples of explicit lists:**
+These all represent the same set of places
+
+::
+
+     OMP_PLACES="{0,1,2,3},{4,5,6,7},{8,9,10,11},{12,13,14,15}"
+     OMP_PLACES="{0:4},{4:4},{8:4},{12:4}"
+     OMP_PLACES="{0:4}:4:4"
+
+.. note::
+    When specifying a place using a set of numbers, if any number cannot be
+    mapped to a processor on the target platform, then that number is
+    ignored within the place, but the rest of the place is kept intact.
+    If all numbers within a place are invalid, then the entire place is removed
+    from the place list, but the rest of place list is kept intact.
+
+The **abstract names** listed below are understood by the run-time environment:
+
+* ``threads:`` Each place corresponds to a single hardware thread.
+* ``cores:`` Each place corresponds to a single core (having one or more hardware threads).
+* ``sockets:`` Each place corresponds to a single socket (consisting of one or more cores).
+* ``numa_domains:`` Each place corresponds to a single NUMA domain (consisting of one or more cores).
+* ``ll_caches:`` Each place corresponds to a last-level cache (consisting of one or more cores).
+
+The abstract name may be appended by a positive number in parentheses to
+denote the length of the place list to be created, that is ``abstract_name(num-places)``.
+If the optional number isn't specified, then the runtime will use all available
+resources of type ``abstract_name``. When requesting fewer places than available
+on the system, the first available resources as determined by ``abstract_name``
+are used. When requesting more places than available on the system, only the
+available resources are used.
+
+**Examples of abstract names:**
+::
+
+    OMP_PLACES=threads
+    OMP_PLACES=threads(4)
+
+OMP_PROC_BIND (Windows, Linux)
+""""""""""""""""""""""""""""""
+Sets the thread affinity policy to be used for parallel regions at the
+corresponding nested level. Enables (``true``) or disables (``false``)
+the binding of threads to processor contexts. If enabled, this is the
+same as specifying ``KMP_AFFINITY=scatter``. If disabled, this is the
+same as specifying ``KMP_AFFINITY=none``.
+
+**Acceptable values:** ``true``, ``false``, or a comma separated list, each
+element of which is one of the following values: ``master``, ``close``, ``spread``, or ``primary``.
+
+**Default:** ``false``
+
+.. warning::
+    ``master`` is deprecated. The semantics of ``master`` are the same as ``primary``.
+
+If set to ``false``, the execution environment may move OpenMP threads between
+OpenMP places, thread affinity is disabled, and ``proc_bind`` clauses on
+parallel constructs are ignored. Otherwise, the execution environment should
+not move OpenMP threads between OpenMP places, thread affinity is enabled, and
+the initial thread is bound to the first place in the OpenMP place list.
+
+If set to ``primary``, all threads are bound to the same place as the primary
+thread.
+
+If set to ``close``, threads are bound to successive places, near where the
+primary thread is bound.
+
+If set to ``spread``, the primary thread's partition is subdivided and threads
+are bound to single place successive sub-partitions.
+
+| **Related environment variables:** ``KMP_AFFINITY`` (overrides ``OMP_PROC_BIND``).
+
+OMP_SCHEDULE
+""""""""""""
+Sets the run-time schedule type and an optional chunk size.
+
+| **Default:** ``static``, no chunk size specified
+| **Syntax:** ``OMP_SCHEDULE="kind[,chunk_size]"``
+
+OMP_STACKSIZE
+"""""""""""""
+
+Sets the number of bytes to allocate for each OpenMP thread to use as the
+private stack for the thread. Recommended size is 16M.
+
+Use the optional suffixes to specify byte units: ``B`` (bytes), ``K`` (Kilobytes),
+``M`` (Megabytes), ``G`` (Gigabytes), or ``T`` (Terabytes) to specify the units.
+If you specify a value without a suffix, the byte unit
+is assumed to be ``K`` (Kilobytes).
+
+This variable does not affect the native operating system threads created by the
+user program, or the thread executing the sequential part of an OpenMP program.
+
+The ``kmp_{set,get}_stacksize_s()`` routines set/retrieve the value.
+The ``kmp_set_stacksize_s()`` routine must be called from sequential part, before
+first parallel region is created. Otherwise, calling ``kmp_set_stacksize_s()``
+has no effect.
+
+| **Default:**
+
+* 32-bit architecture: ``2M``
+* 64-bit architecture: ``4M``
+
+| **Related environment variables:** ``KMP_STACKSIZE`` (overrides ``OMP_STACKSIZE``).
+| **Example:** ``OMP_STACKSIZE=8M``
+
+OMP_THREAD_LIMIT
+""""""""""""""""
+
+Limits the number of simultaneously-executing threads in an OpenMP program.
+
+If this limit is reached and another native operating system thread encounters
+OpenMP API calls or constructs, the program can abort with an error message.
+If this limit is reached when an OpenMP parallel region begins, a one-time
+warning message might be generated indicating that the number of threads in
+the team was reduced, but the program will continue.
+
+The ``omp_get_thread_limit()`` routine returns the value of the limit.
+
+| **Default:** No enforced limit
+| **Related environment variable:** ``KMP_ALL_THREADS`` (overrides ``OMP_THREAD_LIMIT``).
+
+OMP_WAIT_POLICY
+"""""""""""""""
+
+Decides whether threads spin (active) or yield (passive) while they are waiting.
+``OMP_WAIT_POLICY=active`` is an alias for ``KMP_LIBRARY=turnaround``, and
+``OMP_WAIT_POLICY=passive`` is an alias for ``KMP_LIBRARY=throughput``.
+
+| **Default:** ``passive``
+
+.. note::
+    Although the default is ``passive``, unless the user has explicitly set
+    ``OMP_WAIT_POLICY``, there is a small period of active spinning determined
+    by ``KMP_BLOCKTIME``.
+
+KMP_AFFINITY (Windows, Linux)
+"""""""""""""""""""""""""""""
+
+Enables run-time library to bind threads to physical processing units.
+
+You must set this environment variable before the first parallel region, or
+certain API calls including ``omp_get_max_threads()``, ``omp_get_num_procs()``
+and any affinity API calls.
+
+**Syntax:** ``KMP_AFFINITY=[<modifier>,...]<type>[,<permute>][,<offset>]``
+
+``modifiers`` are optional strings consisting of a keyword and possibly a specifier
+
+* ``respect`` (default) and ``norespect`` - determine whether to respect the original process affinity mask.
+* ``verbose`` and ``noverbose`` (default) - determine whether to display affinity information.
+* ``warnings`` (default) and ``nowarnings`` - determine whether to display warnings during affinity detection.
+* ``granularity=<specifier>`` - takes the following specifiers ``thread``, ``core`` (default), ``tile``,
+  ``socket``, ``die``, ``group`` (Windows only).
+  The granularity describes the lowest topology levels that OpenMP threads are allowed to float within a topology map.
+  For example, if ``granularity=core``, then the OpenMP threads will be allowed to move between logical processors within
+  a single core. If ``granularity=thread``, then the OpenMP threads will be restricted to a single logical processor.
+* ``proclist=[<proc_list>]`` - The ``proc_list`` is specified by
+
++--------------------+----------------------------------------+
+| Value              |         Description                    |
++====================+========================================+
+|   <proc_list> :=   |   <proc_id> | { <id_list> }            |
++--------------------+----------------------------------------+
+|   <id_list> :=     |   <proc_id> | <proc_id>,<id_list>      |
++--------------------+----------------------------------------+
+
+Where each ``proc_id`` represents an operating system logical processor ID.
+For example, ``proclist=[3,0,{1,2},{0,3}]`` with ``OMP_NUM_THREADS=4`` would place thread 0 on
+OS logical processor 3, thread 1 on OS logical processor 0, thread 2 on both OS logical
+processors 1 & 2, and thread 3 on OS logical processors 0 & 3.
+
+``type`` is the thread affinity policy to choose.
+Valid choices are ``none``, ``balanced``, ``compact``, ``scatter``, ``explicit``, ``disabled``
+
+* type ``none`` (default) - Does not bind OpenMP threads to particular thread contexts;
+  however, if the operating system supports affinity, the compiler still uses the
+  OpenMP thread affinity interface to determine machine topology.
+  Specify ``KMP_AFFINITY=verbose,none`` to list a machine topology map.
+* type ``compact`` - Specifying compact assigns the OpenMP thread <n>+1 to a free thread
+  context as close as possible to the thread context where the <n> OpenMP thread was
+  placed. For example, in a topology map, the nearer a node is to the root, the more
+  significance the node has when sorting the threads.
+* type ``scatter`` - Specifying scatter distributes the threads as evenly as
+  possible across the entire system. ``scatter`` is the opposite of ``compact``; so the
+  leaves of the node are most significant when sorting through the machine topology map.
+* type ``balanced`` - Places threads on separate cores until all cores have at least one thread,
+  similar to the ``scatter`` type. However, when the runtime must use multiple hardware thread
+  contexts on the same core, the balanced type ensures that the OpenMP thread numbers are close
+  to each other, which scatter does not do. This affinity type is supported on the CPU only for
+  single socket systems.
+* type ``explicit`` - Specifying explicit assigns OpenMP threads to a list of OS proc IDs that
+  have been explicitly specified by using the ``proclist`` modifier, which is required
+  for this affinity type.
+* type ``disabled`` - Specifying disabled completely disables the thread affinity interfaces.
+  This forces the OpenMP run-time library to behave as if the affinity interface was not
+  supported by the operating system. This includes the low-level API interfaces such
+  as ``kmp_set_affinity`` and ``kmp_get_affinity``, which have no effect and will return
+  a nonzero error code.
+
+For both ``compact`` and ``scatter``, ``permute`` and ``offset`` are allowed;
+however, if you specify only one integer, the runtime interprets the value as
+a permute specifier. **Both permute and offset default to 0.**
+
+The ``permute`` specifier controls which levels are most significant when sorting
+the machine topology map. A value for ``permute`` forces the mappings to make the
+specified number of most significant levels of the sort the least significant,
+and it inverts the order of significance. The root node of the tree is not
+considered a separate level for the sort operations.
+
+The ``offset`` specifier indicates the starting position for thread assignment.
+
+| **Default:** ``noverbose,warnings,respect,granularity=core,none``
+| **Related environment variable:** ``OMP_PROC_BIND`` (``KMP_AFFINITY`` takes precedence)
+
+.. note::
+    On Windows with multiple processor groups, the norespect affinity modifier
+    is assumed when the process affinity mask equals a single processor group
+    (which is default on Windows). Otherwise, the respect affinity modifier is used.
+
+.. note::
+    On Windows with multiple processor groups, if the granularity is too coarse, it
+    will be set to ``granularity=group``. For example, if two processor groups exist
+    across one socket, and ``granularity=socket`` the runtime will shift the
+    granularity down to group since that is the largest granularity allowed by the OS.
+
+KMP_ALL_THREADS
+"""""""""""""""
+
+Limits the number of simultaneously-executing threads in an OpenMP program.
+If this limit is reached and another native operating system thread encounters
+OpenMP API calls or constructs, then the program may abort with an error
+message. If this limit is reached at the time an OpenMP parallel region begins,
+a one-time warning message may be generated indicating that the number of
+threads in the team was reduced, but the program will continue execution.
+
+| **Default:** No enforced limit.
+| **Related environment variable:** ``OMP_THREAD_LIMIT`` (``KMP_ALL_THREADS`` takes precedence)
+
+KMP_BLOCKTIME
+"""""""""""""
+
+Sets the time, in milliseconds, that a thread should wait, after completing
+the execution of a parallel region, before sleeping.
+
+Use the optional character suffixes: ``s`` (seconds), ``m`` (minutes),
+``h`` (hours), or ``d`` (days) to specify the units.
+
+Specify infinite for an unlimited wait time.
+
+| **Default:** 200 milliseconds
+| **Related Environment Variable:** ``KMP_LIBRARY``
+| **Example:** ``KMP_BLOCKTIME=1s``
+
+KMP_CPUINFO_FILE
+""""""""""""""""
+
+Specifies an alternate file name for a file containing the machine topology
+description. The file must be in the same format as :file:`/proc/cpuinfo`.
+
+**Default:** None
+
+KMP_DETERMINISTIC_REDUCTION
+"""""""""""""""""""""""""""
+
+Enables (``true``) or disables (``false``) the use of a specific ordering of
+the reduction operations for implementing the reduction clause for an OpenMP
+parallel region. This has the effect that, for a given number of threads, in
+a given parallel region, for a given data set and reduction operation, a
+floating point reduction done for an OpenMP reduction clause has a consistent
+floating point result from run to run, since round-off errors are identical.
+
+| **Default:** ``false``
+| **Example:** ``KMP_DETERMINISTIC_REDUCTION=true``
+
+KMP_DYNAMIC_MODE
+""""""""""""""""
+
+Selects the method used to determine the number of threads to use for a parallel
+region when ``OMP_DYNAMIC=true``. Possible values: (``load_balance`` | ``thread_limit``), where,
+
+* ``load_balance``: tries to avoid using more threads than available execution units on the machine;
+* ``thread_limit``: tries to avoid using more threads than total execution units on the machine.
+
+**Default:** ``load_balance`` (on all supported platforms)
+
+KMP_HOT_TEAMS_MAX_LEVEL
+"""""""""""""""""""""""
+Sets the maximum nested level to which teams of threads will be hot.
+
+.. note::
+    A hot team is a team of threads optimized for faster reuse by subsequent
+    parallel regions. In a hot team, threads are kept ready for execution of
+    the next parallel region, in contrast to the cold team, which is freed
+    after each parallel region, with its threads going into a common pool
+    of threads.
+
+For values of 2 and above, nested parallelism should be enabled.
+
+**Default:** 1
+
+KMP_HOT_TEAMS_MODE
+""""""""""""""""""
+
+Specifies the run-time behavior when the number of threads in a hot team is reduced.
+Possible values:
+
+* ``0`` - Extra threads are freed and put into a common pool of threads.
+* ``1`` - Extra threads are kept in the team in reserve, for faster reuse
+  in subsequent parallel regions.
+
+**Default:** 0
+
+KMP_HW_SUBSET
+"""""""""""""
+
+Specifies the subset of available hardware resources for the hardware topology
+hierarchy. The subset is specified in terms of number of units per upper layer
+unit starting from top layer downwards. E.g. the number of sockets (top layer
+units), cores per socket, and the threads per core, to use with an OpenMP
+application, as an alternative to writing complicated explicit affinity settings
+or a limiting process affinity mask. You can also specify an offset value to set
+which resources to use.
+
+An extended syntax is available when ``KMP_TOPOLOGY_METHOD=hwloc``. Depending on what
+resources are detected, you may be able to specify additional resources, such as
+NUMA domains and groups of hardware resources that share certain cache levels.
+
+**Basic syntax:** ``num_unitsID[@offset] [,num_unitsID[@offset]...]``
+
+Supported unit IDs are not case-insensitive.
+
+| ``S`` - socket
+| ``num_units`` specifies the requested number of sockets.
+
+| ``D`` - die
+| ``num_units`` specifies the requested number of dies per socket.
+
+| ``C`` - core
+| ``num_units`` specifies the requested number of cores per die - if any - otherwise, per socket.
+
+| ``T`` - thread
+| ``num_units`` specifies the requested number of HW threads per core.
+
+``offset`` - (Optional) The number of units to skip.
+
+.. note::
+    The hardware cache can be specified as a unit, e.g. L2 for L2 cache,
+    or LL for last level cache.
+
+**Extended syntax when KMP_TOPOLOGY_METHOD=hwloc:**
+
+Additional IDs can be specified if detected. For example:
+
+``N`` - numa
+``num_units`` specifies the requested number of NUMA nodes per upper layer
+unit, e.g. per socket.
+
+``TI`` - tile
+num_units specifies the requested number of tiles to use per upper layer
+unit, e.g. per NUMA node.
+
+When any numa or tile units are specified in ``KMP_HW_SUBSET`` and the hwloc
+topology method is available, the ``KMP_TOPOLOGY_METHOD`` will be automatically
+set to hwloc, so there is no need to set it explicitly.
+
+If you don't specify one or more types of resource, such as socket or thread,
+all available resources of that type are used.
+
+The run-time library prints a warning, and the setting of
+``KMP_HW_SUBSET`` is ignored if:
+
+* a resource is specified, but detection of that resource is not supported
+  by the chosen topology detection method and/or
+* a resource is specified twice.
+
+This variable does not work if ``KMP_AFFINITY=disabled``.
+
+**Default:** If omitted, the default value is to use all the
+available hardware resources.
+
+**Examples:**
+
+* ``2s,4c,2t``: Use the first 2 sockets (s0 and s1), the first 4 cores on each
+  socket (c0 - c3), and 2 threads per core.
+* ``2s@2,4c@8,2t``: Skip the first 2 sockets (s0 and s1) and use 2 sockets
+  (s2-s3), skip the first 8 cores (c0-c7) and use 4 cores on each socket
+  (c8-c11), and use 2 threads per core.
+* ``5C@1,3T``: Use all available sockets, skip the first core and use 5 cores,
+  and use 3 threads per core.
+* ``1T``: Use all cores on all sockets, 1 thread per core.
+* ``1s, 1d, 1n, 1c, 1t``: Use 1 socket, 1 die, 1 NUMA node, 1 core, 1 thread
+  - use HW thread as a result.
+* ``1s, 1c, 1t``: Use 1 socket, 1 core, 1 thread. This may result in using
+  single thread on a 3-layer topology architecture, or multiple threads on
+  4-layer or 5-layer architecture. Result may even be different on the same
+  architecture, depending on ``KMP_TOPOLOGY_METHOD`` specified, as hwloc can
+  often detect more topology layers than the default method used by the OpenMP
+  run-time library.
+
+To see the result of the setting, you can specify ``verbose`` modifier in
+``KMP_AFFINITY`` environment variable. The OpenMP run-time library will output
+to ``stderr`` the information about the discovered hardware topology before and
+after the ``KMP_HW_SUBSET`` setting was applied.
+
+KMP_INHERIT_FP_CONTROL
+""""""""""""""""""""""
+
+Enables (``true``) or disables (``false``) the copying of the floating-point
+control settings of the primary thread to the floating-point control settings
+of the OpenMP worker threads at the start of each parallel region.
+
+**Default:** ``true``
+
+KMP_LIBRARY
+"""""""""""
+
+Selects the OpenMP run-time library execution mode. The values for this variable
+are ``serial``, ``turnaround``, or ``throughput``.
+
+| **Default:** ``throughput``
+| **Related environment variable:** ``KMP_BLOCKTIME`` and ``OMP_WAIT_POLICY``
+
+KMP_SETTINGS
+""""""""""""
+
+Enables (``true``) or disables (``false``) the printing of OpenMP run-time library
+environment variables during program execution. Two lists of variables are printed:
+user-defined environment variables settings and effective values of variables used
+by OpenMP run-time library.
+
+**Default:** ``false``
+
+KMP_STACKSIZE
+"""""""""""""
+
+Sets the number of bytes to allocate for each OpenMP thread to use as its private stack.
+
+Recommended size is ``16M``.
+
+Use the optional suffixes to specify byte units: ``B`` (bytes), ``K`` (Kilobytes),
+``M`` (Megabytes), ``G`` (Gigabytes), or ``T`` (Terabytes) to specify the units.
+If you specify a value without a suffix, the byte unit is assumed to be K (Kilobytes).
+
+**Related environment variable:** ``KMP_STACKSIZE`` overrides ``GOMP_STACKSIZE``, which
+overrides ``OMP_STACKSIZE``.
+
+**Default:**
+
+* 32-bit architectures: ``2M``
+* 64-bit architectures: ``4M``
+
+KMP_TOPOLOGY_METHOD
+"""""""""""""""""""
+
+Forces OpenMP to use a particular machine topology modeling method.
+
+Possible values are:
+
+* ``all`` - Let OpenMP choose which topology method is most appropriate
+  based on the platform and possibly other environment variable settings.
+* ``cpuid_leaf31`` (x86 only) - Decodes the APIC identifiers as specified by leaf 31 of the
+  cpuid instruction. The runtime will produce an error if the machine does not support leaf 31.
+* ``cpuid_leaf11`` (x86 only) - Decodes the APIC identifiers as specified by leaf 11 of the
+  cpuid instruction. The runtime will produce an error if the machine does not support leaf 11.
+* ``cpuid_leaf4`` (x86 only) - Decodes the APIC identifiers as specified in leaf 4
+  of the cpuid instruction. The runtime will produce an error if the machine does not support leaf 4.
+* ``cpuinfo`` - If ``KMP_CPUINFO_FILE`` is not specified, forces OpenMP to
+  parse :file:`/proc/cpuinfo` to determine the topology (Linux only).
+  If ``KMP_CPUINFO_FILE`` is specified as described above, uses it (Windows or Linux).
+* ``group`` - Models the machine as a 2-level map, with level 0 specifying the
+  different processors in a group, and level 1 specifying the different
+  groups (Windows 64-bit only).
+
+.. note::
+    Support for group is now deprecated and will be removed in a future release. Use all instead.
+
+* ``flat`` - Models the machine as a flat (linear) list of processors.
+* ``hwloc`` - Models the machine as the Portable Hardware Locality (hwloc) library does.
+  This model is the most detailed and includes, but is not limited to: numa domains,
+  packages, cores, hardware threads, caches, and Windows processor groups. This method is
+  only available if you have configured libomp to use hwloc during CMake configuration.
+
+**Default:** all
+
+KMP_VERSION
+"""""""""""
+
+Enables (``true``) or disables (``false``) the printing of OpenMP run-time
+library version information during program execution.
+
+**Default:** ``false``
+
+KMP_WARNINGS
+""""""""""""
+
+Enables (``true``) or disables (``false``) displaying warnings from the
+OpenMP run-time library during program execution.
+
+**Default:** ``true``
 
 LLVM/OpenMP Target Host Runtime (``libomptarget``)
 --------------------------------------------------