Skip to content

Commit

Permalink
Reviewed ILGPU documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
MoFtZ authored and m4rs-mt committed Apr 1, 2022
1 parent d350759 commit 02020ba
Showing 1 changed file with 4 additions and 99 deletions.
103 changes: 4 additions & 99 deletions Docs/Inside-ILGPU.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,8 @@
ILGPU features a modern parallel processing, transformation and compilation model.
It allows parallel code generation and transformation phases to reduce compile time and improve overall performance.

However, parallel code generation in the frontend module is disabled by default.
It can be enabled via the enumeration flag `ContextFlags.EnableParallelCodeGenerationInFrontend`.

The global optimization process can be controlled with the enumeration `OptimizationLevel`.
This level can be specified by passing the desired level to the `ILGPU.Context` constructor.
This level can be specified by passing the desired level to the `Optimize` method of `Context.Builder`.
If the optimization level is not explicitly specified, the level is automatically set to `OptimizationLevel.O1`.

The `OptimizationLevel.O2` level uses additional transformations that increase compile time but yield potentially better GPU code.
Expand Down Expand Up @@ -35,32 +32,6 @@ It can be used to manually compile kernels for a specific platform.
Note that **you do not have to create custom backend instances** on your own when using the ILGPU runtime.
Accelerators already carry associated and configured backends that are used for high-level kernel loading.

```c#
class ...
{
static void Main(string[] args)
{
using (var context = new Context())
{
// Creats a user-defined MSIL backend for .Net code generation
using (var cpuBackend = new DefaultILBackend(context))
{
// Use custom backend
}

// Creates a user-defined backend for NVIDIA GPUs using compute capability 5.0
using (var ptxBackend = new PTXBackend(
context,
PTXArchitecture.SM_50,
TargetPlatform.X64))
{
// Use custom backend
}
}
}
}
```

## IRContext

An `IRContext` manages and caches intermediate-representation (IR) code, which can be reused during the compilation process.
Expand All @@ -70,19 +41,6 @@ An `IRContext` is not tied to a specific `Backend` instance and can be reused ac
Note that the main ILGPU `Context` already has an associated `IRContext` that is used for all high-level kernel-loading functions.
Consequently, users are not required to manage their own contexts in general.

```c#
class ...
{
static void Main(string[] args)
{
var context = new Context();

var irContext = new IRContext(context);
// ...
}
}
```

## Compiling Kernels

Kernels can be compiled manually by requesting a code-generation operation from the backend yielding a `CompiledKernel` object.
Expand All @@ -93,30 +51,6 @@ Alternatively, you can cast a `CompiledKernel` object to its appropriate backend

We recommend that you use the [high-level kernel-loading concepts of ILGPU](ILGPU-Kernels) instead of the low-level interface.

```c#
class ...
{
public static void MyKernel(Index index, ...)
{
// ...
}

static void Main(string[] args)
{
using var context = new Context();
using var b = new PTXBackend(context, ...);
// Compile kernel using no specific KernelSpecialization settings
var compiledKernel = b.Compile(
typeof(...).GetMethod(nameof(MyKernel), BindingFlags.Public | BindingFlags.Static),
default);

// Cast kernel to backend-specific PTXCompiledKernel to access the PTX assembly
var ptxKernel = compiledKernel as PTXCompiledKernel;
System.IO.File.WriteAllBytes("MyKernel.ptx", ptxKernel.PTXAssembly);
}
}
```

## Loading Compiled Kernels

Compiled kernels have to be loaded by an accelerator first before they can be executed.
Expand All @@ -131,35 +65,6 @@ An accelerator object offers different functions to load and configure kernels:
* `LoadKernel`
Loads explicitly and implicitly grouped kernels. However, implicitly grouped kernels will be launched with a group size that is equal to the warp size

```c#
class ...
{
static void Main(string[] args)
{
...
var compiledKernel = backend.Compile(...);

// Load implicitly grouped kernel with an automatically determined group size
var k1 = accelerator.LoadAutoGroupedKernel(compiledKernel);

// Load implicitly grouped kernel with custom group size
var k2 = accelerator.LoadImplicitlyGroupedKernel(compiledKernel);

// Load any kernel (explicitly and implicitly grouped kernels).
// However, implicitly grouped kernels will be dispatched with a group size
// that is equal to the warp size of its associated accelerator
var k3 = accelerator.LoadKernel(compiledKernel);

...

k1.Dispose();
k2.Dispose();
// Leave K3 to the GC
// ...
}
}
```

## Direct Kernel Launching

A loaded kernel can be dispatched using the `Launch` method.
Expand All @@ -169,7 +74,7 @@ For performance reasons, we strongly recommend the use of typed kernel launchers
```c#
class ...
{
static void MyKernel(Index index, ArrayView<int> data, int c)
static void MyKernel(Index1D index, ArrayView<int> data, int c)
{
data[index] = index + c;
}
Expand Down Expand Up @@ -210,7 +115,7 @@ These loading methods work similarly to the these versions, e.g. `LoadAutoGroupe
```c#
class ...
{
static void MyKernel(Index index, ArrayView<int> data, int c)
static void MyKernel(Index1D index, ArrayView<int> data, int c)
{
data[index] = index + c;
}
Expand All @@ -225,7 +130,7 @@ class ...
using (var k = accelerator.LoadAutoGroupedKernel(compiledKernel))
{
var launcherWithCustomAcceleratorStream =
k.CreateLauncherDelegate<AcceleratorStream, Index, ArrayView<int>>();
k.CreateLauncherDelegate<AcceleratorStream, Index1D, ArrayView<int>>();
launcherWithCustomAcceleratorStream(someStream, buffer.Extent, buffer.View, 1);

...
Expand Down

0 comments on commit 02020ba

Please sign in to comment.