clang/docs/HLSL/ExpectedDifferences.rst - llvm-project - Git at Google

 ===================================
 Expected Differences vs DXC and FXC
 ===================================

 .. contents::
    :local:

 Introduction
 ============

 HLSL currently has two reference compilers, the `DirectX Shader Compiler (DXC)
 <https://github.com/microsoft/DirectXShaderCompiler/>`_ and the
 `Effect-Compiler (FXC) <https://learn.microsoft.com/en-us/windows/win32/direct3dtools/fxc>`_.
 The two reference compilers do not fully agree. Some known disagreements in the
 references are tracked on
 `DXC's GitHub
 <https://github.com/microsoft/DirectXShaderCompiler/issues?q=is%3Aopen+is%3Aissue+label%3Afxc-disagrees>`_,
 but many more are known to exist.

 HLSL as implemented by Clang will also not fully match either of the reference
 implementations, it is instead being written to match the `draft language
 specification <https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf>`_.

 This document is a non-exhaustive collection the known differences between
 Clang's implementation of HLSL and the existing reference compilers.

 General Principles
 ------------------

 Most of the intended differences between Clang and the earlier reference
 compilers are focused on increased consistency and correctness. Both reference
 compilers do not always apply language rules the same in all contexts.

 Clang also deviates from the reference compilers by providing different
 diagnostics, both in terms of the textual messages and the contexts in which
 diagnostics are produced. While striving for a high level of source
 compatibility with conforming HLSL code, Clang may produce earlier and more
 robust diagnostics for incorrect code or reject code that a reference compiler
 incorrectly accepted.

 Language Version
 ================

 Clang targets language compatibility for HLSL 2021 as implemented by DXC.
 Language features that were removed in earlier versions of HLSL may be added on
 a case-by-case basis, but are not planned for the initial implementation.

 Overload Resolution
 ===================

 Clang's HLSL implementation adopts C++ overload resolution rules as proposed for
 HLSL 202x based on proposal
 `0007 <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0007-const-instance-methods.md>`_
 and
 `0008 <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0008-non-member-operator-overloading.md>`_.

 The largest difference between Clang and DXC's overload resolution is the
 algorithm used for identifying best-match overloads. There are more details
 about the algorithmic differences in the :ref:`multi_argument_overloads` section
 below. There are three high level differences that should be highlighted:

 * **There should be no cases** where DXC and Clang both successfully
   resolve an overload where the resolved overload is different between the two.
 * There are cases where Clang will successfully resolve an overload that DXC
   wouldn't because we've trimmed the overload set in Clang to remove ambiguity.
 * There are cases where DXC will successfully resolve an overload that Clang
   will not for two reasons: (1) DXC only generates partial overload sets for
   builtin functions and (2) DXC resolves cases that probably should be ambiguous.

 Clang's implementation extends standard overload resolution rules to HLSL
 library functionality. This causes subtle changes in overload resolution
 behavior between Clang and DXC. Some examples include:

 .. code-block:: c++

   void halfOrInt16(half H);
   void halfOrInt16(uint16_t U);
   void halfOrInt16(int16_t I);

   void takesDoubles(double, double, double);

   cbuffer CB {
     bool B;
     uint U;
     int I;
     float X, Y, Z;
     double3 R, G;
   }

   void takesSingleDouble(double);
   void takesSingleDouble(vector<double, 1>);

   void scalarOrVector(double);
   void scalarOrVector(vector<double, 2>);

   export void call() {
     half H;
     halfOrInt16(I); // All: Resolves to halfOrInt16(int16_t).

   #ifndef IGNORE_ERRORS
     halfOrInt16(U); // All: Fails with call ambiguous between int16_t and uint16_t
                     // overloads

     // asfloat16 is a builtin with overloads for half, int16_t, and uint16_t.
     H = asfloat16(I); // DXC: Fails to resolve overload for int.
                       // Clang: Resolves to asfloat16(int16_t).
     H = asfloat16(U); // DXC: Fails to resolve overload for int.
                       // Clang: Resolves to asfloat16(uint16_t).
   #endif
     H = asfloat16(0x01); // DXC: Resolves to asfloat16(half).
                          // Clang: Resolves to asfloat16(uint16_t).

     takesDoubles(X, Y, Z); // Works on all compilers
   #ifndef IGNORE_ERRORS
     fma(X, Y, Z); // DXC: Fails to resolve no known conversion from float to
                   //   double.
                   // Clang: Resolves to fma(double,double,double).

     double D = dot(R, G); // DXC: Resolves to dot(double3, double3), fails DXIL Validation.
                           // FXC: Expands to compute double dot product with fmul/fadd
                           // Clang: Fails to resolve as ambiguous against
                           //   dot(half, half) or dot(float, float)
   #endif

   #ifndef IGNORE_ERRORS
     tan(B); // DXC: resolves to tan(float).
             // Clang: Fails to resolve, ambiguous between integer types.

   #endif

     double D;
     takesSingleDouble(D); // All: Fails to resolve ambiguous conversions.
     takesSingleDouble(R); // All: Fails to resolve ambiguous conversions.

     scalarOrVector(D); // All: Resolves to scalarOrVector(double).
     scalarOrVector(R); // All: Fails to resolve ambiguous conversions.
   }

 .. note::

   In Clang, a conscious decision was made to exclude the ``dot(vector<double,N>, vector<double,N>)``
   overload and allow overload resolution to resolve the
   ``vector<float,N>`` overload. This approach provides ``-Wconversion``
   diagnostic notifying the user of the conversion rather than silently altering
   precision relative to the other overloads (as FXC does) or generating code
   that will fail validation (as DXC does).

 .. _multi_argument_overloads:

 Multi-Argument Overloads
 ------------------------

 In addition to the differences in single-element conversions, Clang and DXC
 differ dramatically in multi-argument overload resolution. C++ multi-argument
 overload resolution behavior (or something very similar) is required to
 implement
 `non-member operator overloading <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0008-non-member-operator-overloading.md>`_.

 Clang adopts the C++ inspired language from the
 `draft HLSL specification <https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf>`_,
 where an overload ``f1`` is a better candidate than ``f2`` if for all arguments the
 conversion sequences is not worse than the corresponding conversion sequence and
 for at least one argument it is better.

 .. code-block:: c++

   cbuffer CB {
     int I;
     float X;
     float4 V;
   }

   void twoParams(int, int);
   void twoParams(float, float);
   void threeParams(float, float, float);
   void threeParams(float4, float4, float4);

   export void call() {
     twoParams(I, X); // DXC: resolves twoParams(int, int).
                      // Clang: Fails to resolve ambiguous conversions.

     threeParams(X, V, V); // DXC: resolves threeParams(float4, float4, float4).
                           // Clang: Fails to resolve ambiguous conversions.
   }

 For the examples above since ``twoParams`` called with mixed parameters produces
 implicit conversion sequences that are { ExactMatch, FloatingIntegral }  and {
 FloatingIntegral, ExactMatch }. In both cases an argument has a worse conversion
 in the other sequence, so the overload is ambiguous.

 In the ``threeParams`` example the sequences are { ExactMatch, VectorTruncation,
 VectorTruncation } or { VectorSplat, ExactMatch, ExactMatch }, again in both
 cases at least one parameter has a worse conversion in the other sequence, so
 the overload is ambiguous.

 .. note::

   The behavior of DXC documented below is undocumented so this is gleaned from
   observation and a bit of reading the source.

 DXC's approach for determining the best overload produces an integer score value
 for each implicit conversion sequence for each argument expression. Scores for
 casts are based on a bitmask construction that is complicated to reverse
 engineer. It seems that:

 * Exact match is 0
 * Dimension increase is 1
 * Promotion is 2
 * Integral -> Float conversion is 4
 * Float -> Integral conversion is 8
 * Cast is 16

 The masks are or'd against each other to produce a score for the cast.

 The scores of each conversion sequence are then summed to generate a score for
 the overload candidate. The overload candidate with the lowest score is the best
 candidate. If more than one overload are matched for the lowest score the call
 is ambiguous.
	===================================
	Expected Differences vs DXC and FXC
	===================================

	.. contents::
	:local:

	Introduction
	============

	HLSL currently has two reference compilers, the `DirectX Shader Compiler (DXC)
	<https://github.com/microsoft/DirectXShaderCompiler/>`_ and the
	`Effect-Compiler (FXC) <https://learn.microsoft.com/en-us/windows/win32/direct3dtools/fxc>`_.
	The two reference compilers do not fully agree. Some known disagreements in the
	references are tracked on
	`DXC's GitHub
	<https://github.com/microsoft/DirectXShaderCompiler/issues?q=is%3Aopen+is%3Aissue+label%3Afxc-disagrees>`_,
	but many more are known to exist.

	HLSL as implemented by Clang will also not fully match either of the reference
	implementations, it is instead being written to match the `draft language
	specification <https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf>`_.

	This document is a non-exhaustive collection the known differences between
	Clang's implementation of HLSL and the existing reference compilers.

	General Principles
	------------------

	Most of the intended differences between Clang and the earlier reference
	compilers are focused on increased consistency and correctness. Both reference
	compilers do not always apply language rules the same in all contexts.

	Clang also deviates from the reference compilers by providing different
	diagnostics, both in terms of the textual messages and the contexts in which
	diagnostics are produced. While striving for a high level of source
	compatibility with conforming HLSL code, Clang may produce earlier and more
	robust diagnostics for incorrect code or reject code that a reference compiler
	incorrectly accepted.

	Language Version
	================

	Clang targets language compatibility for HLSL 2021 as implemented by DXC.
	Language features that were removed in earlier versions of HLSL may be added on
	a case-by-case basis, but are not planned for the initial implementation.

	Overload Resolution
	===================

	Clang's HLSL implementation adopts C++ overload resolution rules as proposed for
	HLSL 202x based on proposal
	`0007 <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0007-const-instance-methods.md>`_
	and
	`0008 <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0008-non-member-operator-overloading.md>`_.

	The largest difference between Clang and DXC's overload resolution is the
	algorithm used for identifying best-match overloads. There are more details
	about the algorithmic differences in the :ref:`multi_argument_overloads` section
	below. There are three high level differences that should be highlighted:

	* There should be no cases where DXC and Clang both successfully
	resolve an overload where the resolved overload is different between the two.
	* There are cases where Clang will successfully resolve an overload that DXC
	wouldn't because we've trimmed the overload set in Clang to remove ambiguity.
	* There are cases where DXC will successfully resolve an overload that Clang
	will not for two reasons: (1) DXC only generates partial overload sets for
	builtin functions and (2) DXC resolves cases that probably should be ambiguous.

	Clang's implementation extends standard overload resolution rules to HLSL
	library functionality. This causes subtle changes in overload resolution
	behavior between Clang and DXC. Some examples include:

	.. code-block:: c++

	void halfOrInt16(half H);
	void halfOrInt16(uint16_t U);
	void halfOrInt16(int16_t I);

	void takesDoubles(double, double, double);

	cbuffer CB {
	bool B;
	uint U;
	int I;
	float X, Y, Z;
	double3 R, G;
	}

	void takesSingleDouble(double);
	void takesSingleDouble(vector<double, 1>);

	void scalarOrVector(double);
	void scalarOrVector(vector<double, 2>);

	export void call() {
	half H;
	halfOrInt16(I); // All: Resolves to halfOrInt16(int16_t).

	#ifndef IGNORE_ERRORS
	halfOrInt16(U); // All: Fails with call ambiguous between int16_t and uint16_t
	// overloads

	// asfloat16 is a builtin with overloads for half, int16_t, and uint16_t.
	H = asfloat16(I); // DXC: Fails to resolve overload for int.
	// Clang: Resolves to asfloat16(int16_t).
	H = asfloat16(U); // DXC: Fails to resolve overload for int.
	// Clang: Resolves to asfloat16(uint16_t).
	#endif
	H = asfloat16(0x01); // DXC: Resolves to asfloat16(half).
	// Clang: Resolves to asfloat16(uint16_t).

	takesDoubles(X, Y, Z); // Works on all compilers
	#ifndef IGNORE_ERRORS
	fma(X, Y, Z); // DXC: Fails to resolve no known conversion from float to
	// double.
	// Clang: Resolves to fma(double,double,double).

	double D = dot(R, G); // DXC: Resolves to dot(double3, double3), fails DXIL Validation.
	// FXC: Expands to compute double dot product with fmul/fadd
	// Clang: Fails to resolve as ambiguous against
	// dot(half, half) or dot(float, float)
	#endif

	#ifndef IGNORE_ERRORS
	tan(B); // DXC: resolves to tan(float).
	// Clang: Fails to resolve, ambiguous between integer types.

	#endif

	double D;
	takesSingleDouble(D); // All: Fails to resolve ambiguous conversions.
	takesSingleDouble(R); // All: Fails to resolve ambiguous conversions.

	scalarOrVector(D); // All: Resolves to scalarOrVector(double).
	scalarOrVector(R); // All: Fails to resolve ambiguous conversions.
	}

	.. note::

	In Clang, a conscious decision was made to exclude the ``dot(vector<double,N>, vector<double,N>)``
	overload and allow overload resolution to resolve the
	``vector<float,N>`` overload. This approach provides ``-Wconversion``
	diagnostic notifying the user of the conversion rather than silently altering
	precision relative to the other overloads (as FXC does) or generating code
	that will fail validation (as DXC does).

	.. _multi_argument_overloads:

	Multi-Argument Overloads
	------------------------

	In addition to the differences in single-element conversions, Clang and DXC
	differ dramatically in multi-argument overload resolution. C++ multi-argument
	overload resolution behavior (or something very similar) is required to
	implement
	`non-member operator overloading <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0008-non-member-operator-overloading.md>`_.

	Clang adopts the C++ inspired language from the
	`draft HLSL specification <https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf>`_,
	where an overload ``f1`` is a better candidate than ``f2`` if for all arguments the
	conversion sequences is not worse than the corresponding conversion sequence and
	for at least one argument it is better.

	.. code-block:: c++

	cbuffer CB {
	int I;
	float X;
	float4 V;
	}

	void twoParams(int, int);
	void twoParams(float, float);
	void threeParams(float, float, float);
	void threeParams(float4, float4, float4);

	export void call() {
	twoParams(I, X); // DXC: resolves twoParams(int, int).
	// Clang: Fails to resolve ambiguous conversions.

	threeParams(X, V, V); // DXC: resolves threeParams(float4, float4, float4).
	// Clang: Fails to resolve ambiguous conversions.
	}

	For the examples above since ``twoParams`` called with mixed parameters produces
	implicit conversion sequences that are { ExactMatch, FloatingIntegral } and {
	FloatingIntegral, ExactMatch }. In both cases an argument has a worse conversion
	in the other sequence, so the overload is ambiguous.

	In the ``threeParams`` example the sequences are { ExactMatch, VectorTruncation,
	VectorTruncation } or { VectorSplat, ExactMatch, ExactMatch }, again in both
	cases at least one parameter has a worse conversion in the other sequence, so
	the overload is ambiguous.

	.. note::

	The behavior of DXC documented below is undocumented so this is gleaned from
	observation and a bit of reading the source.

	DXC's approach for determining the best overload produces an integer score value
	for each implicit conversion sequence for each argument expression. Scores for
	casts are based on a bitmask construction that is complicated to reverse
	engineer. It seems that:

	* Exact match is 0
	* Dimension increase is 1
	* Promotion is 2
	* Integral -> Float conversion is 4
	* Float -> Integral conversion is 8
	* Cast is 16

	The masks are or'd against each other to produce a score for the cast.

	The scores of each conversion sequence are then summed to generate a score for
	the overload candidate. The overload candidate with the lowest score is the best
	candidate. If more than one overload are matched for the lowest score the call
	is ambiguous.