Cartesian Product iteration #1917

GlenHertz · 2013-01-08T04:30:42Z

This is a follow-up to the discussion on the julia-users mailing list here.

There seemed to be good progress but I didn't see any code committed.

JeffBezanson · 2013-01-11T08:02:20Z

In fact, I've been thinking of adding this to the language as follows:

for I in X...
    A[I...] = B[I...]
end

where X is an n-tuple of iterables. This expands to a nest of n for loops and delivers a tuple of elements in I. Other syntax/behavior ideas welcome.

StefanKarpinski · 2013-01-11T08:15:50Z

I like both the idea and the syntax. People are inevitably going to ask for support for X being other kinds of collections. Is that feasible or can we not infer their size well enough?

JeffBezanson · 2013-01-11T08:55:08Z

It is feasible, since after all we can't infer the size of every tuple anyway. You just want to avoid spurious use of non-tuples, like {1:n,1:n} instead of (1:n,1:n). This can be implemented by calling the equivalent of gen_cartesian_map.

timholy · 2013-01-11T12:08:07Z

I like it a lot. Maybe an interesting use case would be to consider what it would look like to compute the discrete Laplacian of a multidimensional array this way. Presumably you'd have to do iteration over the components of I?

johnmyleswhite · 2013-01-11T17:16:57Z

+1

timholy · 2013-07-13T23:29:07Z

Would you be able to do math on the index tuples? For example, suppose I wanted to implement imfilter for arrays of arbitrary dimensions: I might want something like this: (edit: rats, stray keypress had me submit early, continuing to edit)

function imfilter(X,K)
    center = ceil([size(K)...]/2)
    Y = similar(X)
    for I in size(X)...    # Note: have to handle boundaries somehow!
        s = zero(eltype(X))
        for J in size(K)...
            s += K[J...]*X[I...+J...-center...]
        end
        Y[I...] = s
    end
end

StefanKarpinski · 2013-07-13T23:43:25Z

It would be something like ranges(X)... not size(I)... since that would iterate a single time with I equal to size(I). Also: man, that's a lot of dots.

JeffBezanson · 2013-07-13T23:52:43Z

We could define things like interior(X, 2) to get various useful index ranges for use with ....
A syntax like X[(I.+J.-center)...] might work.

timholy · 2013-07-14T00:51:54Z

Stefan, you're right of course.

Another thing to think about is that there may be powerful performance arguments to be able to reduce most array accesses to linear indexing. I seem to remember that 4d indexing was something like 20x slower than linear, when the accesses were in a pattern that allowed you to precalculate the outer dimensions. The imfilter example would probably be best implemented in terms of a list of coefficients and stride offsets.

But for boundary conditions you also have to keep track of the actual coordinates, and feed that into your linear indexing, so you do need cartesian iteration in parallel with linear indexing. If you imagine implementing reflecting, periodic, and NA boundary conditions (the latter meaning "just pretend any undefined elements don't exist, normalizing by the ones you have"), it seems like the sub2ind-style transformations could get fairly hairy. Certainly, this was a pain to handle in Grid.jl (which implements even more boundary conditions), although it eventually worked out.

GunnarFarneback · 2013-07-14T11:19:19Z

For the record there's an imfilter implementation for arbitrary dimensions at
GunnarFarneback@2c3d599

You definitely want to do the inner loop over the kernel as a linear loop and in many cases handle boundary conditions with pre-padding.

timholy · 2013-07-15T00:32:24Z

A bit off-topic, but @GunnarFarneback, your imfilter is certainly better than the one in Images.jl. I'd welcome a pull request, if you're willing to share.

While linear indexing when you can do it will always be faster, I suspect my 20x number needs revision in light of e550406. Nice work, Jeff.

GunnarFarneback · 2013-07-15T09:09:28Z

Sharing is no problem, deciding how to fit it in is more of one. But maybe it's time to get that implementation replaced.

Going back to topic, the bilateral filter may be a good concrete example of how to stress a Cartesian product language feature. In 2D a somewhat naive implementation looks like

function bilateral(in::Matrix{Float64}, kernel_size::Int,
                   sigma_s::Float64, sigma_i::Float64)
    k_s = -0.5 / sigma_s^2
    k_i = -0.5 / sigma_i^2
    kernel_radius = fld(kernel_size - 1, 2)
    out = similar(in)
    height = size(in, 1)
    width = size(in, 2)
    for x = 1:width
        for y = 1:height
            center_value = in[y, x]
            sum_w = 0.0
            sum = 0.0

            for dx = -kernel_radius:kernel_radius
                xi = x + dx
                if xi < 1 || xi > width
                    continue
                end
                for dy = -kernel_radius:kernel_radius
                    yi = y + dy
                    if yi < 1 || yi > height
                        continue
                    end

                    value = in[yi, xi]
                    diff = value - center_value
                    wi = exp(k_s * (dx^2 + dy^2) + k_i * diff^2)
                    sum_w += wi
                    sum += wi * value
                end
            end

            out[y, x] = sum / sum_w
        end
    end
    return out
end

It would be lovely if this could be written compactly for arbitrary dimensionality.

timholy · 2013-07-21T19:23:41Z

Just wanted to give another example here, showing how useful/necessary this will be for switching over to returning views.

It would be nice to write a sum function something like this:

function (+)(A::AbstractSubView, B::AbstractSubView)
    if size(A) != size(B)
        error("Dimension mismatch")
    end
    C = Array(promote(eltype(A), eltype(B)), size(A))
    for I in C...
        C[I...] = A[indexes(A,I.)...] + B[indexes(B,I.)...]
    end
    C
end

I suspect we won't be happy with the performance until this type of syntax generates code something like this:

# Implementation for 3d
function (+)(A::SubView, B::SubView)
    if size(A) != size(B)
        error("Dimension mismatch")
    end
    C = Array(promote(eltype(A), eltype(B)), size(A))
    s1 = size(C, 1)
    s2 = size(C, 2)
    strdC2 = s1
    strdC3 = s1*s2
    strdA2 = size(parent(A), 1)
    strdA3 = size(parent(A), 2)*strdA2
    strdB2 = size(parent(B), 1)
    strdB3 = size(parent(B), 2)*strdB2
    for i3 = 1:size(C, 3)
        o3 = (i3-1)*strdC2
        oA3 = (A.indexes[3][i3]-1)*strdA3
        oB3 = (B.indexes[3][i3]-1)*strdB3
        for i2 = 1:s2
            o2 = o3 + (i2-1)*strdC2
            oA2 = oA3 + (A.indexes[2][i2]-1)*strdA2
            oB2 = oB3 + (B.indexes[2][i2]-1)*strdB2
            for i1 = 1:s1
                C[i1+o2] = parent(A)[A.indexes[1][i1]+oA2] + parent(B)[B.indexes[1][i1]+oB2]
            end
        end
    end
end

Here I've not required that SubViews be strided, i.e., the indexes could be AbstractVectors.

timholy · 2013-09-08T15:58:47Z

Since some discussions have started focusing on post-0.2, let me chime in with an update on this issue (which I think is an important 0.3 milestone).

First, let me correct something I said immediately above: in recent julias, cartesian indexing of multidimensional arrays is completely viable. (I suspect it's been this way ever since Jeff's "hoist array size" fix for #3712.) So a lot of the "convert multidimensional indices into a linear index" gymnastics we've done in the past simply aren't necessary, in my view. That simplifies many things, and in particular makes a tuple-of-indexes (as Jeff proposes above) the obvious foundation for Cartesian iteration.

Second, my Cartesian package serves two purposes: (1) it lets me do the things I need to do now, and (2) it's a research project for figuring out what facilities we need to have in base. Therefore, let me provide an update. Cartesian is now feeling quite full-featured, and I'm finding its current incarnation allows a very wide range of algorithms to be written efficiently and easily. As Jeff pointed out, the code is quite hard to read (see the example below), but it is so useful that I have internal code where the majority of my functions make heavy use of these facilities. I should note that the README is completely out-of-date, so if you're interested in seeing what's up you'll want to check the code. (I'm not certain it's worth writing a nice README now because it would be outdated once tools such as these move into base, and the internal in-code comments should help a lot.) I should also add that there could probably be a bit of consolidating---different capabilities were added incrementally---but neither would I describe it as "crufty." In other words, we may want an equivalent of at least half of the macros in Cartesian.

One of the main surprises in developing and using Cartesian has been how important it is to have a rich interface for specifying what happens in different dimensions. By far the most significant innovation was "inlined expression parsing" which allows one to generate expressions like this (this is a fake and useless example, but I use real stuff similar to this all the time):

k = 5
for i3 = 2:size(A, 3)-1, i2 = 2:size(A,2)-1, i1 = 2:size(A,1)-1
    B[i1, i2, k] += A[i1, i2, i3]
end

using the following code (here N=3):

k = 5
@nloops $N i d->2:size(A,d)-1 begin
    (@nref $N B d->(d==$N)?k:i_d) += (@nref $N A i)
end

Those anonymous function-expressions get "inlined" at parsing time, so this is literally translated into the efficient version above.

Such tricks allow you to implement reductions, broadcasting, nearest-neighbor searches, etc almost trivially. (In particular, every example that's been mentioned in this issue so far is quite easy to implement.) Moreover, most of your code naturally ends up being cache-efficient, which as @lindahua has pointed out is the most performant way to perform reductions. Finally, the resulting algorithms are performant for both Arrays and SubArrays, often with speedups of 100x for the latter (and without a major gap compared to Array).

Without the anonymous function-expression inlining, I found that I was much more limited in terms of what I could do---in terms of my own algorithms, developing this very simple functionality was a big breakthrough. So I'd argue that a facility like this will be an important consideration for whatever ends up in base.

If you want to play with Cartesian at all, I've provided function-versions of all the macros to make it easy to see how things get translated (example: Cartesian._nloops(3, :i, :(d->(d==2)?7:k_d), :(f(A))).

timholy · 2013-09-08T16:01:14Z

I should also add that I'm not advertising for users of Cartesian; it's definitely an at-your-own-risk and could-change-API-without-warning kind of package (like I say, it's a research project), and one that I fervently hope gets outdated soon.

timholy · 2013-10-11T07:17:13Z

Mentioned on the mailing list, but posting here so it doesn't get lost between now and when people have time to start thinking about this again: a more flexible syntax is here. It allows the iterator ranges to be specified and introduces a new keyword with to indicate coupled iteration variables. (As shown in the Cartesian README, the latter is useful particularly for reductions and broadcasting, and can also be used to implement efficient mapslices behaviors.) It uses a generalized comprehension notation for most of this, see also #4470.

timholy · 2014-03-01T20:43:11Z

If we implement APL indexing (#5949), we'll need either this or #5395. That is, unless we restrict the indexes to something for which linear indexing is efficient. I think it's simply unfeasible otherwise.

A brief update on where we stand on this issue, relative to this gist (I'm excising JIT warmups, etc.):

julia> A = rand(500,500,30);

julia> @time sumloop(A,30)
elapsed time: 0.407903035 seconds (64 bytes allocated)
1.1249159436884916e8

julia> @time sumtuple(A,30)
elapsed time: 0.783183949 seconds (64 bytes allocated)
1.1249159436884916e8

Pretty impressive, although I suspect some would be loath to give up a factor of 2.

But the more relevant case reveals we're not there yet:

julia> @time sumtuple_gen(A,1)
elapsed time: 6.926610589 seconds (1740671904 bytes allocated)
3.749719812291454e6

More than 500-fold slower (note it's doing 30-fold less work).

Closes #1917, closes #6437

Closes JuliaLang#1917, closes JuliaLang#6437

ghost assigned JeffBezanson Jan 27, 2013

JeffBezanson mentioned this issue Jul 13, 2013

later-stage macros? #3706

Closed

timholy mentioned this issue Jul 21, 2013

RFC: Improved implementation of SubArray #3503

Closed

lindahua mentioned this issue Oct 10, 2013

Generalized comprehension syntax #4470

Closed

timholy mentioned this issue Dec 4, 2013

WIP: exploratory branch for fast array views #5003

Closed

timholy mentioned this issue Jan 4, 2014

Put Cartesian's "big 4" macros into base #5296

Closed

timholy mentioned this issue Mar 2, 2014

more inlining #3796

Merged

tknopp mentioned this issue Jun 2, 2014

accessing arrays via tuples #6940

Closed

timholy added a commit that referenced this issue Sep 20, 2014

Add a multidimensional (cartesian) iterator

96e85ff

Closes #1917, closes #6437

timholy closed this as completed in 0314913 Nov 15, 2014

waTeim pushed a commit to waTeim/julia that referenced this issue Nov 23, 2014

Add a multidimensional (cartesian) iterator

6df91e9

Closes JuliaLang#1917, closes JuliaLang#6437

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cartesian Product iteration #1917

Cartesian Product iteration #1917

GlenHertz commented Jan 8, 2013

JeffBezanson commented Jan 11, 2013

StefanKarpinski commented Jan 11, 2013

JeffBezanson commented Jan 11, 2013

timholy commented Jan 11, 2013

johnmyleswhite commented Jan 11, 2013

timholy commented Jul 13, 2013

StefanKarpinski commented Jul 13, 2013

JeffBezanson commented Jul 13, 2013

timholy commented Jul 14, 2013

GunnarFarneback commented Jul 14, 2013

timholy commented Jul 15, 2013

GunnarFarneback commented Jul 15, 2013

timholy commented Jul 21, 2013

timholy commented Sep 8, 2013

timholy commented Sep 8, 2013

timholy commented Oct 11, 2013

timholy commented Mar 1, 2014

Cartesian Product iteration #1917

Cartesian Product iteration #1917

Comments

GlenHertz commented Jan 8, 2013

JeffBezanson commented Jan 11, 2013

StefanKarpinski commented Jan 11, 2013

JeffBezanson commented Jan 11, 2013

timholy commented Jan 11, 2013

johnmyleswhite commented Jan 11, 2013

timholy commented Jul 13, 2013

StefanKarpinski commented Jul 13, 2013

JeffBezanson commented Jul 13, 2013

timholy commented Jul 14, 2013

GunnarFarneback commented Jul 14, 2013

timholy commented Jul 15, 2013

GunnarFarneback commented Jul 15, 2013

timholy commented Jul 21, 2013

timholy commented Sep 8, 2013

timholy commented Sep 8, 2013

timholy commented Oct 11, 2013

timholy commented Mar 1, 2014