This document is meant to describe the changes that have been made to the LLVM and Clang codebases to support HLSL, the High-Level Shading Language. The focus is on design changes and general approach rather than specific changes made to support the language and runtime.
This section outlines the considerations that have prompted multiple changes.
- Provide a successor to the fxc.exe and d3dcompiler_47.dll components. This means providing a DLL that can be used in a variety of situations, as well as a command-line program to access its functionality.
- The requirement to act as a proper DLL means that it shouldn't rely on process-wide state, as this prevents callers from using it from more than one thread concurrently. This includes changing the current directory, using stdin/stdout/stderr, or deciding to terminate the process (except for fatal conditions).
- There are scenarios in which applications may choose to redistribute the built DLL, for example an IDE for writing shaders, tools to instrument and debug them, or a game that chooses to emit shaders at runtime. This means that compilation time and DLL/program size are important considerations.
- Because the new command-line program should be a replacement of specific components, it's desirable to keep the interface similar to the prior versions (API in case of the DLL, command-line format in the case of the program).
This section describes why and how the HLSL on LLVM project has forked LLVM and Clang.
LLVM and Clang provide an excellent starting point for building a compiler, especially one that will exist in an ecosystem of organizations contributing to the pipeline, whether it be in the form of tooling and abstractions by middleware and tool authors, or backend compilers by hardware vendors.
The decision was made to fork LLVM and Clang rather than work directly and upstream all changes for the following reasons:
- While HLSL started out as a C derivative, over time is has drifted away, not to the extent that C++ and Objective-C have but certainly a fair bit. Furthermore, some of the behavior was never compatible to begin with, and so there are significant differences, especially in the type system, that make upstreaming difficult.
- HLSL is expected to evolve over the next few years to shed some of the incompatibilities that exist for historical reasons (as opposed to those that provide actual developer benefit). It's entirely possible that a future version will be much more closely aligned to C/C++ semantics and could be more easily adapted. This version of the codebase isn't it.
- The changes to LLVM are meaningless without an execution model, which is currently being worked on. Changes to LLVM are more likely to get upstreamed, however we have chosen to not consider this until we have multiple implementations of DXIL backends.
We have already done a 3.4 to 3.7 upgrade in the sources, and it's entirely possible that this will happen again. Therefore, all changes introduces are done in entirely new files, or marked with an 'HLSL Change', or marked with a pair of 'HLSL Change Starts' / 'HLSL Change Ends'. This makes integrations easier (or rather less difficult).
One of the goals of dxcompiler is to be a reusable component for a variety of applications. While Clang and LLVM have support for being hosted as a dynamically loaded library, there is still a number of assumptions made that make usage problematic, specifically:
- Usage of process-wide handles (stdin/stdout/stderr).
- Reliance on file system access (clang has some level of virtual file system support in the later versions), including temporary files.
- Direct usage of memory allocation mechanisms.
- Reliance on other process-wide constructs like environment variables.
Redesigning a number of these mechanisms would require changes throughout the codebase, and it's hard to whether any regressions are introduced when a large number of changes are integrated.
The solution we have implemented relies on having a thread-local component that can service I/O requests as well as other OS-implemented or process-wide constructs. The library is then meant to be used through specific API points that will set and tear down this component as appropriate. The MSFileSystem class in include/llvm/Support/MSFileSystem.h provides the access point (although the API should likely be renamed, as it's broader than a pur file system abstraction, and 'MS' monikers are being changed to 'HLSL').
To guard against regressions, we can simply verify that no libraries include APIs that are virtualized, such as calls to CreateFile. If a case is found, a drop-in replacement function call can be made in its place.
Some of the tasks that are simplified include:
- Host in-process in an IDE to provide tooling services.
- Guarantee that no process execution takes place, or that the host environment influences the tools without an explicit usage.
- Provide virtualized I/O for all kinds of file access, including redirecting to in-memory buffers.
- Override memory allocation to constrain memory consumption.
At the moment, memory allocation is still not redirected, but I/O has been cleaned up.
Clang and LLVM already provide a number of error handling mechanisms: returning a bool flag to indicate success or failure, returning a structured object that flags the result along with other information, using STL system errors for certain errors, or using errno for some C standard library calls.
There are two other kinds of error handling mechanisms introduced by HLSL on LLVM.
- C++ Exceptions. The primary use case for these is to handle out-of-memory exceptions. A second case is to handle cases where LLVM and Clang today attempt to terminate the process even though they are in a recoverable state (for example, when a '--help' switch is found in command-line options).
- HRESULT. The APIs are designed to be familiar to Windows developers and to provide a simple transition for d3dcompiler_47.dll users. As such, error codes are typically returned as HRESULT values that can be tested with the SUCCEEDED/FAILED macros.
Some of the code that is written to interface with the rest of the system can also make use of these error handling strategies. A number of macros to handle them, or to convert one into the other, are available in include/dxc/Support/Global.h.
Removing unused functionality can help reduce the binary size, improve performance, and speed up compilation time for the project. However, this has to be traded off against changing the behavior of LLVM and Clang (and the cost of understanding this change for developers who are familiar with those projects), as well as the future maintenance for integrations.
The recommendations is to avoid removing small bits of functionality, and only do so for significant subsystems that can be "sliced off" cleanly (for example, the interpreter component or target support).
The dxcompiler DLL is designed to export a number of components that can be reused in different contexts. The API is exposed as a lightweight form of the Microsoft Component Object Model (COM); a similar approach can be seen in the design of the xmllite library.
The functionality of the library is encapsulated in discrete compmonents, each of which is embodied in an object that implements one or more interfaces. Interfaces are derived from IUnknown as in COM and are responsible for interface discovery and lifetime management. Object construction is done via the single exported API, DxcCreateObject, which acts much like DllCreateObject would in a COM library.
Interfaces are mostly COM-compatible and have been designed to be easy to use from other languages that can consume COM libraries, such as the .NET runtime or C++ applications. Importantly, memory allocated in the library that should be freed by the consumer is allocated using the COM allocation, through CoTaskMemAlloc or the use of IMalloc via CoGetMalloc().
Note that this lightweight COM support implies that some features are missing:
- There is no support for marshalling across COM apartments.
- There is (at the moment at least) no management of library references based on outstanding objects (a typical bug that would arise from this would be, for example, unloading dxcompiler while outstanding objects exist, at which point even releasing them would lead to an access violation).
Tradionally, the D3D compilers have used an ID3DBlob interface to encapsulate a buffer. The HLSL compiler avoids pulling in DirectX headers and defines an IDxcBlob interface that has the same layout and interface identifier (IID).
Buffers are often used to hold text, for example shader sources or compilation logs. IDxcBlobEncoding inherits from IDxcBlob and has functionality to declare the encoding for the buffer.
The design principle for using a character pointer or an IDxcBlobEncoding is as follows: for internal dxcompiler text, UTF-8 and char* are used; for API parameters that are "short" such as file names or command line parameters, UTF-16 and wchar_t* are used; for longer text such as source files or error logs, IDxcBlobEncoding is used.
The DLL provides a "library" component that provides utility functions to create and transform blobs and strings.
In the utilshct directory, an hctdb.py file can be found that initialized a number of Python object instances describing different aspects of the DXIL specification. These act as a queryable, programmable repository of information which feed into other tasks, such as generating documentation, generating code or performing compatibility checks across versions.
We require that the database be kept up-to-date for the concepts embedded there to drive a number of code-generation tasks, which can be found in other .py files in that same directory.
llvm::Module is the type that represents a shader program. It includes metadata nodes to provide details around the ABI, flags, etc. However, manipulation of all this information in terms of metadata is not very efficient or convenient.
As part of the work with HLSL, we introduce two modules that are attached in-memory to an llvm::Module: a high-level HLModule, and a low-level DxilModule. The high-level module is used in the early passes to deal with HLSL-as-a-language concepts, such as intrinsics, matrices and vectors; the low-level module is used to deal with concepts as they exist in the DXIL specification. Only one of these additional modules ever exists at one point; the DxilGenerationPass that does the translation destroys the high-level representation and creates a low-level one as part of its work.
To preserve many of the benefits of LLVM's modular pipeline, it is useful to serialize and deserialize shaders at different stages of processing, and so both HLModule and DxilModule provide support for these. The expectation for a wholesale compilation from source, however, is that this information lives only in memory until it's ready to be serialized out in final DIXL form. As such, various passes along the way may need to do update to these modules to maintain consistency (for example, if global DCE removes a variable, the corresponding resource mapping that reflects the shader ABI should be cleaned up as well).