You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently all neurons/synapses are implemented as in-place operations using nested for loops. While this is great for single-CPU performance, it has a number of downsides:
We can't automatically use BLAS or sparse BLAS routines when available.
We can't use AD libraries like Zygote as we mutate arrays (which most AD libraries don't support easily, performantly, or at all).
To run on GPUs, we'd either need to switch to using GPUifyLoops.jl (which isn't a terrible idea), or write custom kernels (not difficult, but duplicative, especially if supporting both CUDA and AMDGPU, which I fully intend to implement).
An OOP implementation using broadcasting should solve many of these problems. I propose implementing these in parallel to our current implementations, using dispatch or kwargs to choose which to use at runtime.
The text was updated successfully, but these errors were encountered:
Currently all neurons/synapses are implemented as in-place operations using nested for loops. While this is great for single-CPU performance, it has a number of downsides:
An OOP implementation using broadcasting should solve many of these problems. I propose implementing these in parallel to our current implementations, using dispatch or kwargs to choose which to use at runtime.
The text was updated successfully, but these errors were encountered: