-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BitArray broadcasting #6035
BitArray broadcasting #6035
Conversation
I'm not sure that I will recognize It would be nice to understand why it becomes slow when doing actual Anyway, thanks for doing this! |
First, these are some very nice fixes for addition of bools, etc. The performance challenges are indeed unfortunate. I'm well aware that this is extremely challenging code to write, and I am amazed by the efficiency of what you've pulled off in other domains. But I can't help but think that because the arrays are traversed in order, there must be an efficient solution here too. I confess I don't even understand the storage of Presuming that they are densely packed, then a stride along dimension 1 won't work. In that case, I wonder about using the post-expression to do something like this (3d example) for each BitArray
where
This is still pretty vague in the details, but perhaps it's a viable direction? |
I should say that I'd also be comfortable with the notion of restricting broadcasting to two arrays when a
and I'm not sure I'd wish it on anyone to have to generalize that to more inputs. |
Come to think of it, there's an alternative to writing specialized versions. We could add one more element to the dictionary |
Ok, I think I made it (see last commit). @timholy, the key observation was indeed that the arrays are traversed in order; I tried a plethora of different approaches, but in the end the fastest is to use So, unless there are objections, I'm going to merge this tomorrow. |
I'm totally on board with this. For BitArrays, I agree with what I believe is your view, that views don't make sense. So linear indexing is fine. |
Tim, I might have misinterpreted your last comment, but just to clarify: in the current version of this PR, the array type is not taken into account, the code is written generically for AbstractArray arguments, in terms of So this PR is on hold again at the moment. |
Update: I experimented a bit with SubArrays, and found out that
This is because we can use the index tuple as iteration state, and if everything is manually inlined it basically does the same thing as cartesian. See this gist: the last test function (manually inlined iteration based on custom immutable states) is as fast as the first (standard cartesian loops). Without manual inlining, the penalty is about a factor of 2. In any case, it doesn't seem that this is feasible, right at this moment. Might be worth revisiting after some of the work which is being done about points 1 and 2 above gets merged. Perhaps for the time being I could write specialized broadcast versions, e.g. using cartesian when all arguments are StridedArrays and start/next/done iteration otherwise, or something like that. I think that with some experiments I may be able to determine the best strategy in each case. |
I forgot to add: the proof of principle that efficient tuples + more inlining could provide fast |
Ah yes, that's it, I missed that, thanks for pointing it out. Those iteration functions are exactly what I also wrote when testing tuples (not shown in the gist). As I said, if you also substitute tuples with immutable types, the 2.5-fold performance penalty which you measured disappears completely. |
Can you clarify what you mean? Do you mean the iterator state itself is an immutable? I was hoping we wouldn't have to generate a separate type for each dimensionality. But if needed, so be it. |
Oh, and that's quite exciting about getting rid of the 2.5x penalty! Will be nice to finally have some good multidimensional iterators, once that inlining branch gets merged. |
Yes, I got rid of the difference in performance by using an immutable iterator state, as in the gist, where they are generated up to dimension 10 (see the comment at the top to see what they end up looking like). But I'd rather wait for Tuples to be finally treated like immutable types (which I believe is a long-standing issue) rather then introducing such utter ugliness into Base. Of course, others may feel differently, especially if that optimization is not in sight. |
For example, now [true].+[true] is the same as [true]+[true], i.e. [2] As for Arrays, this is obtained by special-casing Bools.
renamed as gen_broadcast_function, since its purpose is different from the exported version
Closes #5877 This also specializes broadcast! when the target is a BitArray and adds an (unexported) bitbroadcast function which creates a BitArray target rather then an Array. Also, some comparison operations between AbstractArray{Bool} are specialized if possible.
also specialize Bool^Integer while at it
simply by removing a method!
addresses #3171
introduces specialized versions of the broadcasting core which use start/next instead of cartesian iteration. Used when the arguments only involve Arrays or BitArrays.
My tests showed that as soon as a SubArray was involved, the cartesian iteration version was the one performing best. So in last commit there are 4 versions of I also added I think this is ready for merging, I'll just wait for Travis to turn green. |
@carlobaldassi, I think given that the tuple stuff still hasn't materialized, I'd lobby strongly for inclusion of your gist in base rather than waiting for #6437. We can switch to using tuples when that's feasible. |
This fixes #5877 and touches related issues, in that it makes element-wise comparisons (
.==
,.<
, etc.) broadcasting, and also completes broadcasting behaviour for all element-wise operations which involve BitArrays (.*
,./
, etc.), unless I missed something.Besides that, there is some minor stuff in here, the most relevant being that it fixes
.+
and.-
for Bools, which now returnArray{Int}
. There's also a specialization forBool^Int
.This is a pull request even though to me it's ready for merging, mainly to highlight some issues I found. Here's a list of reasons why I didn't merge it directly:
Since broadcast is @toivoh's (and others) creature, review is in order
The code adds a specialization for
broadcast!(f::Function, B::BitArray, ...)
, but adds another functionbitbroadcast
, which is not exported: should it be exported?Performance is good, unless broadcasting is actually needed (i.e.
broadcast_shape
yields something different frompromote_shape
) and BitArrays are involved in the argument list.3a. In the case
promote_shape
succeeds, I added fallbacks which make things efficient. One possibly questionable issue is my use oftry
blocks for that purpose.3b. In case it fails, and BitArrays are involved (e.g.
trues(5) .+ falses(1,5)
), cartesian iteration kills performance, to the point that it is much faster to invokebitunpack
on the arguments first; however, that may clearly become very memory-expensive. I haven't been able to overcome this issue yet, so advice would be welcome on this. (One might imagine unpacking below a certain size, but that is not a very satisfactory solution.)Example of timings for the last issue: here's a specialized version which doesn't truly broadcast:
And here's one where it does:
Similar timings are obtained for comparison operations like
.==
.Part of the problem is that
@inbounds
is ineffective for BitArrays, but I don't think it's only that (even if it would be nice to have some way of making it work, cc @JeffBezanson ). Probably, with a sufficient effort, one could come up with some (contrived) way out from this by avoiding multidimensional indexing for BitArrays, but even in that case it is hard to think of a solution for general AbstractArrays (e.g. comparing aBitArray
with anArray
).