Releases · m4rs-mt/ILGPU

03 Jan 03:00

m4rs-mt

v0.8.1.1

38bd74e

Release v0.8.1.1

The new stable version offers significant performance and code quality improvements of the generated kernel programs.

Fixed related to Trace and Debug asserts (#176).
Fixed related to Trace and Debug asserts (#176).
Improved compile-time performance by up to 4X (#110).
Reduced memory footprint by up to 3X (#109, #118).
Added new optimization level O2 to enable expensive and aggressive optimizations (#70, #110, #111, #121).
No compiler release builds in Nuget package to improve runtime performance (#130).
Added new IR verifier that can be enabled via ContextFlags.EnableVerifier (#121).
Added generation of vectorized instructions to PTX backend (#111).
Fixed critical code-generation issue on Unix platforms (#116).
Added dynamic shared memory support for all platforms (#97, #98).
Added new KernelInfo objects to kernel loaders in order to query detailed kernel statistics (e.g. amount of local memory in bytes) (#104).

Assets 2

03 Jan 02:59

m4rs-mt

v0.8.1-beta1

2ae313e

Release v0.8.1-beta1 Pre-release

Pre-release

The new beta version offers significant performance improvements of the generated kernel programs.

Improved compile-time performance by up to 4X (#110).
Reduced memory footprint by up to 3X (#109, #118).
Added new optimization level O2 to enable expensive and aggressive optimizations (#70, #110, #111, #121).
No compiler release builds in Nuget package to improve runtime performance (#130).
Added new IR verifier that can be enabled via ContextFlags.EnableVerifier (#121).
Added generation of vectorized instructions to PTX backend (#111).
Fixed critical code-generation issue on Unix platforms (#116).
Added dynamic shared memory support for all platforms (#97, #98).
Added new KernelInfo objects to kernel loaders in order to query detailed kernel statistics (e.g. amount of local memory in bytes) (#104).

Assets 2

03 Jan 02:58

m4rs-mt

v0.8

37afd05

Release v0.8.0

The new stable version offers significant performance and code quality improvements of the generated kernel programs.

Added support for on-the-fly specialization of kernels using dynamic partial evaluation.
Added support for dynamic shared memory (CPU & Cuda backends).
Added new KernelConfig structure to specify launch dimensions for explicitly grouped kernels.
Added new Index1 structure to avoid name clashes with new System.Index structure.
Added additional tuple conversion methods to Index2 and Index3 types.
Added new EntryPointDescription structure to specify an entry point and its index type.
Added RuntimeKernelConfig structure to combine static and dynamic information about a particular kernel launch.
Added support for linear arrays in local memory.
Added support for enum-value interop (#66).
Reworked explicitly grouped kernel launchers to use the new KernelConfig structure instead of GroupedIndex types.
Simplified static Grid and Group properties.
Removed all GroupedIndex types.
Updated the whole compilation pipeline to enable more aggressive optimizations.
Significantly improved performance of emitted PTX and OpenCL code by enabling more aggressive optimizations and clever code generation (#70).
Added Support for "unmanaged" C# structures in the scope of buffers and views.
Reworked PTX backend to support all API changes and to fix several critical code-generation issues. This also includes emission of PTX instructions that mimic the Cuda compiler (#68).
Reworked OpenCL backend to support all API changes and to fix several
critical code-generation issues (#67, #72, #73, #74, #78, #85, #88, #91, #92).
New debug information input module to support the latest PDB format updates.
Considerably improved error messages using debug information. (#86)
Reduced memory consumption during the compilation process.
Performance improvements of the internal compilation pipeline.
Improved performance of kernel launchers.
Extended CudaAPI to supported paged-lock host-memory allocation functions.
Extended ExchangeBuffer to use new page-locked memory allocation (if available).
Added new IR-rewriter API to perform more advanced IR transformations.
Adapted all existing transformations to use the new rewriter API.
Reduced memory consumption of all nodes by compressing information.
Redesigned several IR nodes to support global program transformations.
Reworked implementation of GetSubView in the context of generic and multidimensional array views (#19).
Fixed several issues in the scope of address-space inference.
Fixed critical code generation issues that could occur when replacing values.

Special thanks to @MoFtZ for contributing to this release.

Assets 2

03 Jan 02:55

m4rs-mt

v0.8.0-beta3

133b295

Release v0.8.0-beta3 Pre-release

Pre-release

Considerably improved error messages using debug information. (#86)
Reduced memory consumption during the compilation process.
Performance improvements of the internal compilation pipeline.
Added Support for "unmanaged" C# structures in the scope of buffers and views.
New debug information input module to support the latest PDB format updates.
Fixed several OpenCL code generation issues (#85, #88, #91, #92)

Special thanks to @MoFtZ for contributing to this release.

Assets 2

03 Jan 02:54

m4rs-mt

v0.8.0-beta2

d7cb589

Release v0.8.0-beta2 Pre-release

Pre-release

Significantly improved performance of emitted PTX and OpenCL code by enabling more aggressive optimizations and clever code generation (#70).
Improved performance of kernel launchers.
Added support for linear arrays in local memory.
Added support for enum-value interop (#66).
Reworked PTXBackend to support all API changes and to fix several critical code-generation issues. This also includes emission of PTX instructions that mimic the Cuda compiler.
Reworked OpenCL backend to support all API changes and to fix several critical code-generation issues (#72, #73, #74, #78).
Updated the whole compilation pipeline to enable more aggressive optimizations.
Added new IR-rewriter API to perform more advanced IR transformations.
Adapted all existing transformations to use the new rewriter API.
Reduced memory consumption of all nodes by compressing information.
Redesigned several IR nodes to support global program transformations.

Special thanks to @MoFtZ for contributing to this release.

Assets 2

03 Jan 02:52

m4rs-mt

v0.8.0-beta1

15923a6

Release v0.8.0-beta1 Pre-release

Pre-release

Added support for on-the-fly specialization of kernels using dynamic partial evaluation.
Added support for dynamic shared memory (CPU & Cuda backends).
Added new KernelConfig structure to specify launch dimensions for explicitly grouped kernels.
Reworked explicitly grouped kernel launchers to use the new KernelConfig structure instead of GroupedIndex types.
Simplified static Grid and Group properties.
Added new Index1 structure to avoid name clashes with new System.Index structure.
Added additional tuple conversion methods to Index2 and Index3 types.
Added new EntryPointDescription structure to specify an entry point and its index type.
Added RuntimeKernelConfig structure to combine static and dynamic information about a particular kernel launch.
Removed all GroupedIndex types.
Extended PTXInstructions to support bool-based IOs in PTXBackend (#68).
Extended ExchangeBuffer to use new page-locked memory allocation (if available).
Extended CudaAPI to supported paged-lock host-memory allocation functions.
Reworked implementation of GetSubView in the context of generic and multidimensional array views (#19).
Fixed several issues in the scope of address-space inference.
Fixed critical code generation issues that could occur when replacing values.
Fixed invalid pointer types in the scope of AtomicCAS operations on AMD hardware (#67).

Assets 2

03 Jan 02:50

m4rs-mt

v0.7.1

5334a68

Release v0.7.1

Added extension method to load the effective address for Cuda and CPU-based array views.
Added support for data blocks (value containers) for easy the interop with value tuples.
Added additional primitive data blocks to simplify operations on tuples consisting of primitive values.
Added new ExchangeBuffer class to simplify memory transfers between CPU and GPU memory.
Fixed invalid sub-group extension name in CLAccelerator.
Fixed invalid association of supported and unsupported CL accelerators.
Removed obsolete dispose functionality from AcceleratorId classes.
Fixed OpenCL code generator for float values that are assign integers values.
Fixed invalid creation of kernel interop types in OpenCL backend.
Made ABI thread safe to support concurrent queries of size/alignment information.

Assets 2

03 Jan 02:47

m4rs-mt

v0.7.0

4c553aa

Release v0.7.0

Added support for .Net Standard 2.1.
Added support for OpenCL-compatible GPUs (beta)
Added parallel code generation in backends to improve code-generation speed.
Added minimum CUDA driver version detection.
Enabled adaptive shared-memory allocation in CPUAccelerator.
Added new Utility.Select method that can be used to create highly-efficient select instructions in favor of if branches.
Added support to access Grid and Group indices via properties.
Added support for generic Warp intrinsics that will be automatically generated by the compiler.
Redesigned intrinsic math functions and moved XMath functions to the ILGPU.Algorihtms library. Use the new IntrinsicMath class for math functions that are supported on all platforms.
Reworked intrinsic functions to allow custom implementations of intrinsics for different backends.
Ported project to VS2019 including all static-program analysis checks.
Applied generate code cleanup to be compliant with the new analysis checks.
Redesigned AcceleratorId functionality.
Updated CudaMemoryBuffer to support MemSetToZero using alternate streams.
Fixed retrieving version number of ILGPU assembly.
Fixed non-deterministic generation of Phi mappings.
Fixed invalid loading of small basic types onto the evaluation stack.
Added utility property to Accelerator to resolve a launch extent with the maximum number of groups.
Fixed invalid shared-memory allocation within non-kernel functions in PTXBackend.

Special thanks to @MoFtZ for contributing to this release.

Assets 2

03 Jan 02:45

m4rs-mt

v0.6.0

db7bdd2

Release v0.6.0

Greatly improved ILGPU version that included significant performance and code quality improvements.

Added support for new GeForce RTX cards.
Added initial support for arrays in kernels.
Added additional 3D indexing functionality to ArrayView types.
Added automatic binding of accelerators in advanced multi-GPU scenarios.
Tested debugging and profiling capabilities on NVIDIA GPUs.
Released test framework to verify generated kernel code.
Improved performance of predicates in PTXBackend.
Removed strict array-length restriction from allocation nodes.
Enhanced generation of get/set field operations.
Optimized generation of conditional branches.
Fixed invalid generation of predicate barriers in PTXBackend.
Fixed invalid register allocation of string types in PTXBackend.
Removed explicit tracking of predecessors in phi nodes.
Fixed invalid debug assertion in SequencePoint.
Fixed invalid alignment of shared-memory allocations in PTXBackend.
Fixed invalid shared memory configuration of Cuda kernels.

Special thanks to @MoFtZ and @mikhail-khalizev for contributing to this release.

Assets 2

03 Jan 02:42

m4rs-mt

v0.5.1

4a92499

Release v0.5.1

Improved version of v0.5 that contains bug fixes and performance improvements and features based on community feedback.

Polished error messages and util methods.
Fixed invalid DebuggerDisplay attributes on array views.
Added support for loading addresses of static fields.
Added support to disable kernel caches and automatic disposal of kernels and memory buffers (Community request)..
Extended kernel loaders with additional overloads.
Added support to clear internal caches (Community request).
Fixed invalid extent and bounds checks in MemoryBuffer.CopyTo.
Fixed invalid initialization of PTX-specific intrinsic functions.
Fixed invalid load/store instructions of bytes in PTXBackend.
Fixed invalid generation of null values in PTXBackend.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: m4rs-mt/ILGPU

Release v0.8.1.1

Release v0.8.1-beta1

Release v0.8.0

Release v0.8.0-beta3

Release v0.8.0-beta2

Release v0.8.0-beta1

Release v0.7.1

Release v0.7.0

Release v0.6.0

Release v0.5.1