Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convolutions for DL #52

Open
MikeInnes opened this issue Jan 4, 2018 · 8 comments
Open

Convolutions for DL #52

MikeInnes opened this issue Jan 4, 2018 · 8 comments

Comments

@MikeInnes
Copy link

I'd like to use this excellent package for the filters commonly used in deep learning. I tried a naive implementation here, but performance isn't great and I'm wondering if we need support in this package to improve that.

To give a rough outline of the semantics, we have a D×C×N image array where: D are the data dimension(s) (e.g. width and height), C is the channel dimension (e.g. colour) and N is a batch dimension (so we can convolve N independent images at once). This is convolved with a K×C×F filter (K being the filter dimension(s) and F being the number of filters / output channels) to produce a D′×F×N output array. Each slice along F is treated as a separate "filter" that spans the channel dimension, and the filter outputs are concatenated.

Hope that's clear and I'm happy to give more references if it's useful. Does this looks like something that's easy and appropriate for this package to support, or would it require an entirely new implementation?

@timholy
Copy link
Member

timholy commented Jan 4, 2018

That seems to be an important operation, so I'm totally willing to support it here. To my mind the most unfortunate part is the fact that the color dimension is not the fastest dimension (why did they do that?), but this seems to be standard so I guess we just have to live with it.

I'm pretty swamped with Julia 0.7 changes right now (including one that I think you will be very happy with), but I'll get to this eventually.

@jekbradbury
Copy link

jekbradbury commented Jan 4, 2018

Different deep learning frameworks use different choices of whether the spatial or channel dimension is the innermost/fastest; NCHW is the term for what Mike's describing in row-major frameworks but there's also NHWC (some discussion is here and here). It looks like NHWC is slightly faster on CPU and NCHW is slightly faster on GPU, with the differences among optimized implementations small enough that it's almost never worth it to transpose data you already have in one of those formats. I wouldn't think any of that would change in a column-major environment, other than NHWC being represented to the user as CHWN and NCHW as WHCN. Both are natively supported by cuDNN and MKL-DNN, itself likely the fastest open-source CPU implementation for deep learning convolutions.

@timholy
Copy link
Member

timholy commented Jan 4, 2018

In Julia there are likely to be performance advantages stemming from the fact that ColorTypes can act as fixed-size vectors, so the compiler can unroll loops automatically. One could surely do that by hand for channel-slow representations, but it's more of a pain in the neck.

@jekbradbury
Copy link

The only place in a convolutional NN where the "channel" dimension actually corresponds to colors and has a small fixed size is in the input layer; in all other layers there are typically many more channels (from dozens to thousands) and unrolling may be counterproductive from a compilation time standpoint.

@MikeInnes
Copy link
Author

Yeah, I think there's a small mindset difference going on here: in ML we tend to see the channels as a stack of related images, rather than as a single image with N-dimensional pixels. It's somewhat more like a frame.

@timholy
Copy link
Member

timholy commented Jan 5, 2018

Yes, when the channel dimension is that large, it's definitely better to use a loop. Thanks for clarifying!

@GunnarFarneback
Copy link

To add to this, the output is frequently computed with a stride in the spatial dimensions, producing a smaller image. That is important to optimize for.

@Tokazama
Copy link

Tokazama commented Jan 13, 2020

I thought I'd give a bit of an update on this for some people that have expressed interest in approaching this issue.

We have https://github.com/FluxML/NNlib.jl for neural network filters in Julia. If we find a path forward to compatibility with it we would get GPU support for NN filters and more easily hook into a lot of the machine learning libraries in Julia. I'm not sure if it will be possible to get native GPU code to handle/optimize colorant type calculations. It may be best to just decide at what point in the convolution it is worth converting to a Float32 array and back.

Also take a look at https://github.com/JuliaGPU/CuArrays.jl for some GPU interop. I don't think there's currently a way to fully interact with AMD GPUs using Julia's compiler yet but that's being worked on https://github.com/JuliaGPU/AMDGPUnative.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants