Convolutions for DL #52

MikeInnes · 2018-01-04T11:23:13Z

I'd like to use this excellent package for the filters commonly used in deep learning. I tried a naive implementation here, but performance isn't great and I'm wondering if we need support in this package to improve that.

To give a rough outline of the semantics, we have a D×C×N image array where: D are the data dimension(s) (e.g. width and height), C is the channel dimension (e.g. colour) and N is a batch dimension (so we can convolve N independent images at once). This is convolved with a K×C×F filter (K being the filter dimension(s) and F being the number of filters / output channels) to produce a D′×F×N output array. Each slice along F is treated as a separate "filter" that spans the channel dimension, and the filter outputs are concatenated.

Hope that's clear and I'm happy to give more references if it's useful. Does this looks like something that's easy and appropriate for this package to support, or would it require an entirely new implementation?

The text was updated successfully, but these errors were encountered:

timholy · 2018-01-04T20:46:48Z

That seems to be an important operation, so I'm totally willing to support it here. To my mind the most unfortunate part is the fact that the color dimension is not the fastest dimension (why did they do that?), but this seems to be standard so I guess we just have to live with it.

I'm pretty swamped with Julia 0.7 changes right now (including one that I think you will be very happy with), but I'll get to this eventually.

jekbradbury · 2018-01-04T22:37:00Z

Different deep learning frameworks use different choices of whether the spatial or channel dimension is the innermost/fastest; NCHW is the term for what Mike's describing in row-major frameworks but there's also NHWC (some discussion is here and here). It looks like NHWC is slightly faster on CPU and NCHW is slightly faster on GPU, with the differences among optimized implementations small enough that it's almost never worth it to transpose data you already have in one of those formats. I wouldn't think any of that would change in a column-major environment, other than NHWC being represented to the user as CHWN and NCHW as WHCN. Both are natively supported by cuDNN and MKL-DNN, itself likely the fastest open-source CPU implementation for deep learning convolutions.

timholy · 2018-01-04T22:53:24Z

In Julia there are likely to be performance advantages stemming from the fact that ColorTypes can act as fixed-size vectors, so the compiler can unroll loops automatically. One could surely do that by hand for channel-slow representations, but it's more of a pain in the neck.

jekbradbury · 2018-01-05T01:23:37Z

The only place in a convolutional NN where the "channel" dimension actually corresponds to colors and has a small fixed size is in the input layer; in all other layers there are typically many more channels (from dozens to thousands) and unrolling may be counterproductive from a compilation time standpoint.

MikeInnes · 2018-01-05T09:47:53Z

Yeah, I think there's a small mindset difference going on here: in ML we tend to see the channels as a stack of related images, rather than as a single image with N-dimensional pixels. It's somewhat more like a frame.

timholy · 2018-01-05T12:14:46Z

Yes, when the channel dimension is that large, it's definitely better to use a loop. Thanks for clarifying!

GunnarFarneback · 2018-04-23T11:44:31Z

To add to this, the output is frequently computed with a stride in the spatial dimensions, producing a smaller image. That is important to optimize for.

Tokazama · 2020-01-13T13:00:59Z

I thought I'd give a bit of an update on this for some people that have expressed interest in approaching this issue.

We have https://github.com/FluxML/NNlib.jl for neural network filters in Julia. If we find a path forward to compatibility with it we would get GPU support for NN filters and more easily hook into a lot of the machine learning libraries in Julia. I'm not sure if it will be possible to get native GPU code to handle/optimize colorant type calculations. It may be best to just decide at what point in the convolution it is worth converting to a Float32 array and back.

Also take a look at https://github.com/JuliaGPU/CuArrays.jl for some GPU interop. I don't think there's currently a way to fully interact with AMD GPUs using Julia's compiler yet but that's being worked on https://github.com/JuliaGPU/AMDGPUnative.jl.

MikeInnes mentioned this issue Feb 19, 2018

Convolutions and Pooling JuliaGPU/GPUArrays.jl#102

Open

MikeInnes mentioned this issue May 14, 2019

ImageFiltering API JuliaImages/ImagesAPI.jl#1

Open

Tokazama mentioned this issue Aug 4, 2020

Better axis support via name traits JuliaImages/Images.jl#908

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convolutions for DL #52

Convolutions for DL #52

MikeInnes commented Jan 4, 2018

timholy commented Jan 4, 2018

jekbradbury commented Jan 4, 2018 •

edited

Loading

timholy commented Jan 4, 2018

jekbradbury commented Jan 5, 2018

MikeInnes commented Jan 5, 2018

timholy commented Jan 5, 2018

GunnarFarneback commented Apr 23, 2018

Tokazama commented Jan 13, 2020 •

edited

Loading

Convolutions for DL #52

Convolutions for DL #52

Comments

MikeInnes commented Jan 4, 2018

timholy commented Jan 4, 2018

jekbradbury commented Jan 4, 2018 • edited Loading

timholy commented Jan 4, 2018

jekbradbury commented Jan 5, 2018

MikeInnes commented Jan 5, 2018

timholy commented Jan 5, 2018

GunnarFarneback commented Apr 23, 2018

Tokazama commented Jan 13, 2020 • edited Loading

jekbradbury commented Jan 4, 2018 •

edited

Loading

Tokazama commented Jan 13, 2020 •

edited

Loading