| =================================== |
| Expected Differences vs DXC and FXC |
| =================================== |
| |
| .. contents:: |
| :local: |
| |
| Introduction |
| ============ |
| |
| HLSL currently has two reference compilers, the `DirectX Shader Compiler (DXC) |
| <https://github.com/microsoft/DirectXShaderCompiler/>`_ and the |
| `Effect-Compiler (FXC) <https://learn.microsoft.com/en-us/windows/win32/direct3dtools/fxc>`_. |
| The two reference compilers do not fully agree. Some known disagreements in the |
| references are tracked on |
| `DXC's GitHub |
| <https://github.com/microsoft/DirectXShaderCompiler/issues?q=is%3Aopen+is%3Aissue+label%3Afxc-disagrees>`_, |
| but many more are known to exist. |
| |
| HLSL as implemented by Clang will also not fully match either of the reference |
| implementations, it is instead being written to match the `draft language |
| specification <https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf>`_. |
| |
| This document is a non-exhaustive collection the known differences between |
| Clang's implementation of HLSL and the existing reference compilers. |
| |
| General Principles |
| ------------------ |
| |
| Most of the intended differences between Clang and the earlier reference |
| compilers are focused on increased consistency and correctness. Both reference |
| compilers do not always apply language rules the same in all contexts. |
| |
| Clang also deviates from the reference compilers by providing different |
| diagnostics, both in terms of the textual messages and the contexts in which |
| diagnostics are produced. While striving for a high level of source |
| compatibility with conforming HLSL code, Clang may produce earlier and more |
| robust diagnostics for incorrect code or reject code that a reference compiler |
| incorrectly accepted. |
| |
| Language Version |
| ================ |
| |
| Clang targets language compatibility for HLSL 2021 as implemented by DXC. |
| Language features that were removed in earlier versions of HLSL may be added on |
| a case-by-case basis, but are not planned for the initial implementation. |
| |
| Overload Resolution |
| =================== |
| |
| Clang's HLSL implementation adopts C++ overload resolution rules as proposed for |
| HLSL 202x based on proposal |
| `0007 <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0007-const-instance-methods.md>`_ |
| and |
| `0008 <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0008-non-member-operator-overloading.md>`_. |
| |
| The largest difference between Clang and DXC's overload resolution is the |
| algorithm used for identifying best-match overloads. There are more details |
| about the algorithmic differences in the :ref:`multi_argument_overloads` section |
| below. There are three high level differences that should be highlighted: |
| |
| * **There should be no cases** where DXC and Clang both successfully |
| resolve an overload where the resolved overload is different between the two. |
| * There are cases where Clang will successfully resolve an overload that DXC |
| wouldn't because we've trimmed the overload set in Clang to remove ambiguity. |
| * There are cases where DXC will successfully resolve an overload that Clang |
| will not for two reasons: (1) DXC only generates partial overload sets for |
| builtin functions and (2) DXC resolves cases that probably should be ambiguous. |
| |
| Clang's implementation extends standard overload resolution rules to HLSL |
| library functionality. This causes subtle changes in overload resolution |
| behavior between Clang and DXC. Some examples include: |
| |
| .. code-block:: c++ |
| |
| void halfOrInt16(half H); |
| void halfOrInt16(uint16_t U); |
| void halfOrInt16(int16_t I); |
| |
| void takesDoubles(double, double, double); |
| |
| cbuffer CB { |
| bool B; |
| uint U; |
| int I; |
| float X, Y, Z; |
| double3 R, G; |
| } |
| |
| void takesSingleDouble(double); |
| void takesSingleDouble(vector<double, 1>); |
| |
| void scalarOrVector(double); |
| void scalarOrVector(vector<double, 2>); |
| |
| export void call() { |
| half H; |
| halfOrInt16(I); // All: Resolves to halfOrInt16(int16_t). |
| |
| #ifndef IGNORE_ERRORS |
| halfOrInt16(U); // All: Fails with call ambiguous between int16_t and uint16_t |
| // overloads |
| |
| // asfloat16 is a builtin with overloads for half, int16_t, and uint16_t. |
| H = asfloat16(I); // DXC: Fails to resolve overload for int. |
| // Clang: Resolves to asfloat16(int16_t). |
| H = asfloat16(U); // DXC: Fails to resolve overload for int. |
| // Clang: Resolves to asfloat16(uint16_t). |
| #endif |
| H = asfloat16(0x01); // DXC: Resolves to asfloat16(half). |
| // Clang: Resolves to asfloat16(uint16_t). |
| |
| takesDoubles(X, Y, Z); // Works on all compilers |
| #ifndef IGNORE_ERRORS |
| fma(X, Y, Z); // DXC: Fails to resolve no known conversion from float to |
| // double. |
| // Clang: Resolves to fma(double,double,double). |
| |
| double D = dot(R, G); // DXC: Resolves to dot(double3, double3), fails DXIL Validation. |
| // FXC: Expands to compute double dot product with fmul/fadd |
| // Clang: Fails to resolve as ambiguous against |
| // dot(half, half) or dot(float, float) |
| #endif |
| |
| #ifndef IGNORE_ERRORS |
| tan(B); // DXC: resolves to tan(float). |
| // Clang: Fails to resolve, ambiguous between integer types. |
| |
| #endif |
| |
| double D; |
| takesSingleDouble(D); // All: Fails to resolve ambiguous conversions. |
| takesSingleDouble(R); // All: Fails to resolve ambiguous conversions. |
| |
| scalarOrVector(D); // All: Resolves to scalarOrVector(double). |
| scalarOrVector(R); // All: Fails to resolve ambiguous conversions. |
| } |
| |
| .. note:: |
| |
| In Clang, a conscious decision was made to exclude the ``dot(vector<double,N>, vector<double,N>)`` |
| overload and allow overload resolution to resolve the |
| ``vector<float,N>`` overload. This approach provides ``-Wconversion`` |
| diagnostic notifying the user of the conversion rather than silently altering |
| precision relative to the other overloads (as FXC does) or generating code |
| that will fail validation (as DXC does). |
| |
| .. _multi_argument_overloads: |
| |
| Multi-Argument Overloads |
| ------------------------ |
| |
| In addition to the differences in single-element conversions, Clang and DXC |
| differ dramatically in multi-argument overload resolution. C++ multi-argument |
| overload resolution behavior (or something very similar) is required to |
| implement |
| `non-member operator overloading <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0008-non-member-operator-overloading.md>`_. |
| |
| Clang adopts the C++ inspired language from the |
| `draft HLSL specification <https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf>`_, |
| where an overload ``f1`` is a better candidate than ``f2`` if for all arguments the |
| conversion sequences is not worse than the corresponding conversion sequence and |
| for at least one argument it is better. |
| |
| .. code-block:: c++ |
| |
| cbuffer CB { |
| int I; |
| float X; |
| float4 V; |
| } |
| |
| void twoParams(int, int); |
| void twoParams(float, float); |
| void threeParams(float, float, float); |
| void threeParams(float4, float4, float4); |
| |
| export void call() { |
| twoParams(I, X); // DXC: resolves twoParams(int, int). |
| // Clang: Fails to resolve ambiguous conversions. |
| |
| threeParams(X, V, V); // DXC: resolves threeParams(float4, float4, float4). |
| // Clang: Fails to resolve ambiguous conversions. |
| } |
| |
| For the examples above since ``twoParams`` called with mixed parameters produces |
| implicit conversion sequences that are { ExactMatch, FloatingIntegral } and { |
| FloatingIntegral, ExactMatch }. In both cases an argument has a worse conversion |
| in the other sequence, so the overload is ambiguous. |
| |
| In the ``threeParams`` example the sequences are { ExactMatch, VectorTruncation, |
| VectorTruncation } or { VectorSplat, ExactMatch, ExactMatch }, again in both |
| cases at least one parameter has a worse conversion in the other sequence, so |
| the overload is ambiguous. |
| |
| .. note:: |
| |
| The behavior of DXC documented below is undocumented so this is gleaned from |
| observation and a bit of reading the source. |
| |
| DXC's approach for determining the best overload produces an integer score value |
| for each implicit conversion sequence for each argument expression. Scores for |
| casts are based on a bitmask construction that is complicated to reverse |
| engineer. It seems that: |
| |
| * Exact match is 0 |
| * Dimension increase is 1 |
| * Promotion is 2 |
| * Integral -> Float conversion is 4 |
| * Float -> Integral conversion is 8 |
| * Cast is 16 |
| |
| The masks are or'd against each other to produce a score for the cast. |
| |
| The scores of each conversion sequence are then summed to generate a score for |
| the overload candidate. The overload candidate with the lowest score is the best |
| candidate. If more than one overload are matched for the lowest score the call |
| is ambiguous. |