Skip to content

Commit

Permalink
Drop size 0 arrays in concatenate (dask#4167)
Browse files Browse the repository at this point in the history
* Test `da.concatenate` with size 0 array

Make sure that `da.concatenate` does not include empty arrays in the
result as they don't contribute any data.

* Drop size 0 arrays from `da.concatenate`

If any of the arrays passed to `da.concatenate` has a size of 0, then it
won't contribute anything to the array created by concatenation. As such
make sure to drop any size 0 arrays from the sequence of arrays to
concatenate before proceeding.

* Handle dtype and all 0 size case

* Cast inputs with asarray

* Coerce all arrays to concatenate to the same type

* Drop obsoleted type handling code

* Comment on why arrays are being dropped

* Use `np.promote_types` for parity w/old behavior

* Handle endianness during type promotion

* Construct empty array of right type

Avoids the need to cast later and the addition of another node to the
graph.

* Promote types in `concatenate` using `_meta`

There was some left over type promotion code for the arrays to
concatenate using their `dtype`s. However this should now use the
`_meta` information instead since that is available.

* Ensure `concatenate` is working on Dask Arrays

* Raise `ValueError` if `concatenate` gets no arrays

NumPy will raise if no arrays are provided to concatenate as it is
unclear what to do. This adds a similar exception for Dask Arrays. Also
this short circuits handling unusual cases later. Plus raises a clearer
exception than one might see if this weren't raised.

* Test `concatenate` raises when no arrays are given

* Determine the concatenated array's shape

Needed to handle the case where all arrays have trivial shapes.

* Handle special sequence cases together

* Update dask/array/core.py

Co-Authored-By: James Bourbeau <[email protected]>

* Drop outdated comment

* Assume valid `_meta` in `concatenate`

Simplifies the `_meta` handling logic in `concatenate` to assume that
`_meta` is valid. As all arguments have been coerced to Dask Arrays,
this is a reasonable assumption to make.
  • Loading branch information
jakirkham authored and jrbourbeau committed Jun 13, 2019
1 parent 46aef58 commit 66531ba
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 11 deletions.
32 changes: 21 additions & 11 deletions dask/array/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2981,12 +2981,26 @@ def concatenate(seq, axis=0, allow_unknown_chunksizes=False):
"""
seq = [asarray(a) for a in seq]

n = len(seq)

if n == 0:
if not seq:
raise ValueError("Need array(s) to concatenate")

from .utils import meta_from_array
metas = [s._meta for s in seq]
metas = [meta_from_array(m, m.ndim) for m in metas]
meta = np.concatenate(metas)

# Promote types to match meta
seq = [a.astype(meta.dtype) for a in seq]

# Find output array shape
ndim = len(seq[0].shape)
shape = tuple(
sum((a.shape[i] for a in seq)) if i == axis else seq[0].shape[i]
for i in range(ndim)
)

# Drop empty arrays
seq = [a for a in seq if a.size]

if axis < 0:
axis = ndim + axis
Expand All @@ -2995,7 +3009,10 @@ def concatenate(seq, axis=0, allow_unknown_chunksizes=False):
"\nData has %d dimensions, but got axis=%d")
raise ValueError(msg % (ndim, axis))

if n == 1:
n = len(seq)
if n == 0:
return from_array(np.empty(shape=shape, dtype=meta.dtype))
elif n == 1:
return seq[0]

if (not allow_unknown_chunksizes and
Expand All @@ -3012,13 +3029,6 @@ def concatenate(seq, axis=0, allow_unknown_chunksizes=False):
for i, ind in enumerate(inds):
ind[axis] = -(i + 1)

from .utils import meta_from_array
metas = [getattr(s, '_meta', s) for s in seq]
metas = [meta_from_array(m, getattr(m, 'ndim', 1)) for m in metas]
meta = np.concatenate(metas)

seq = [a.astype(meta.dtype) for a in seq]

uc_args = list(concat(zip(seq, inds)))
_, seq = unify_chunks(*uc_args, warn=False)

Expand Down
22 changes: 22 additions & 0 deletions dask/array/tests/test_array_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,28 @@ def test_concatenate_fixlen_strings():
da.concatenate([a, b]))


def test_concatenate_zero_size():

x = np.random.random(10)
y = da.from_array(x, chunks=3)
result_np = np.concatenate([x, x[:0]])
result_da = da.concatenate([y, y[:0]])
assert_eq(result_np, result_da)
assert result_da is y

# dtype of a size 0 arrays can affect the output dtype
result_np = np.concatenate([np.zeros(0, dtype=float), np.zeros(1, dtype=int)])
result_da = da.concatenate([da.zeros(0, dtype=float), da.zeros(1, dtype=int)])

assert_eq(result_np, result_da)

# All empty arrays case
result_np = np.concatenate([np.zeros(0), np.zeros(0)])
result_da = da.concatenate([da.zeros(0), da.zeros(0)])

assert_eq(result_np, result_da)


def test_block_simple_row_wise():
a1 = np.ones((2, 2))
a2 = 2 * a1
Expand Down

0 comments on commit 66531ba

Please sign in to comment.