- added a JNIExampleClasses cmake target to build all java examples
- added a file with posix autocompletions for yacx.sh
- add nix-shell file
if you're on nixos or using nix package manager you can now easily create an enviroment with cudatoolkit and jdk by just entering:
$ nix-shell
- multiple small issues with the CMakeLists
- fix logger colors and newline issues
- usage of dynamic shared memory
- add JNI-example for execute c-code
- add JNI-benchmarktests for gemm- and reducekernel
- JNI
- uses pinned memory
- createHalfTransosed for convert float-matrix to halfs and transposes it
- add example for execute c-code
- fix typo libary -> library
- fix compilerwarnings
- host-code optimizations for faster kernellaunch
- new static Devices class to easily filter devices
- by name
- by UUID
- by lambda function
- Logger
- outputs in color
- set logging levels/file output per command line, try with example_logger.cpp
--file=yacx.log
: set file output--log=DEBUG
: set logging level
- KernelTime has been refactored
- every instance calculates effective bandwidth
- up
- down
- total
- overloaded
<<
operator
- every instance calculates effective bandwidth
- Logger is now static
- allows setting stream output to cerr or cout
- additionally output log to file
- C Executor
- Benchmarking
- can benchmark CUDA and even works with rise-lang/executor
- KernelArgCreator to easily benchmark with different KernelArg inputs
- more example Kernels
- JNI
- more java and scala examples
- junits tests
- KernelArg Outputs are instantly reusable as Inputs see #89
- added sbt project file
- JNI
- exceptions in case of illegal arguments (e.g.
NULL
) - split up
KernelArg
-class intoBooleanArg
,ByteArg
,ShortArg
,IntArg
,LongArg
,HalfArg
,FloatArg
,DoubleArg
andPaddingArg
PaddingArg
helps to easily pad matrices to work more easily with e.g. TensorCoresHalfArg
will convert a float array with a CUDA Kernel
- Java files were moved to a proper package:
src/{java=>main/java/yacx}/
- exceptions in case of illegal arguments (e.g.
- created a classDiagram
- a Code of Conduct
- a Contribution Guideline
- renamed (see #78)
- Headers.{length=>numHeaders}
- Options.{options=>content}
- KernelArgs.m_{chArgs=>voArgs}
- cleaned up repo
- fixed workflows for pull requests
- issue templates
- updated README.md
- rename project from cudaexecutor/cudacompiler to yacx - yet another cudaexecutor
- KernelTime: measure time of kernel execution as well as uploading and downloading of KernelArgs
- fully featured JNI
- lots of java and scala examples
- build and execute script for java/scala: cudaexecutor.sh
- KernelArgs refactor
- moved KernelArg uploading into KernelArgs
- c++ bindings
- get devices
- template kernels
- logging and exception for debugging
- nvrtc option class
- mostly replicated as a JNI
- Classes renamed
- ProgramArg => KernelArg
- Program => Source
- Kernel => Kernel, Program
- c++ bindings
- execute abitrary cuda kernels