Add out-of-place implementations #8

jpsamaroo · 2019-11-04T23:53:03Z

Currently all neurons/synapses are implemented as in-place operations using nested for loops. While this is great for single-CPU performance, it has a number of downsides:

We can't automatically use BLAS or sparse BLAS routines when available.
We can't use AD libraries like Zygote as we mutate arrays (which most AD libraries don't support easily, performantly, or at all).
To run on GPUs, we'd either need to switch to using GPUifyLoops.jl (which isn't a terrible idea), or write custom kernels (not difficult, but duplicative, especially if supporting both CUDA and AMDGPU, which I fully intend to implement).

An OOP implementation using broadcasting should solve many of these problems. I propose implementing these in parallel to our current implementations, using dispatch or kwargs to choose which to use at runtime.

jpsamaroo self-assigned this Nov 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add out-of-place implementations #8

Add out-of-place implementations #8

jpsamaroo commented Nov 4, 2019

Add out-of-place implementations #8

Add out-of-place implementations #8

Comments

jpsamaroo commented Nov 4, 2019