RFC: add support for repeating each element of an array #654

kgryte · 2023-07-13T08:48:43Z

This RFC proposes adding support to the array API specification for repeating each element of an array.

Overview

Based on array comparison data, the API is available in most array libraries. The main exception is PyTorch which deviates in its naming convention (repeat_interleave vs NumPy et al's repeat).

Prior art

Proposal

def repeat(x: array, repeats: Union[int, Sequence[int], array], /, *, axis: Optional[int] = None)

repeats: the number of repetitions for each element.

If axis is not None,
- if repeats is an array, repeats.shape must broadcast to x.shape[axis].
- if repeats is a sequence of ints, len(repeats) must broadcast to x.shape[axis].
- if repeats is an integer, repeats must be broadcasted to match the size of a specified axis.
If axis is None,
- if repeats is an array, repeats.shape must broadcast to prod(x.shape).
- if repeats is a sequence of ints, len(repeats) must broadcast to prod(x.shape).
- if repeats is an integer, repeats must be broadcasted to match the size of the flattened array.
axis: specifies the axis along which to repeat values. If None, use a flattened input array and return a flat output array.

Questions

Both PyTorch and JAX support a kwarg for specifying the output size in order to avoid stream synchronization (PyTorch) and to allow compilation (JAX). Without such kwarg support, is this API viable? And what are the reasons for needing this kwarg when other array libraries (TensorFlow) omit such a kwarg?
When flattening the input array, flatten in row-major order? (precedent: nonzero)
Is PyTorch okay adding a repeat function in its main namespace, given the divergence in behavior for torch.Tensor.repeat, which behaves similar to np.tile?
CuPy only allows int, List, and Tuple for repeats, not an array. PyTorch may prefer a list of ints (see Unnecessary cuda synchronizations that we should remove in PyTorch pytorch/pytorch#108968).

Adding tuple argument support to numpy.repeat to avoid repeated invocations: ENH: Allow tuple arguments for numpy.repeat numpy/numpy#21435 and ENH: Introduce multiple pair parameters in the 'repeat' function numpy/numpy#23937.
Mention of xarray's need for repeat: Common APIs across array libraries (1 year later) #187 (comment)

The text was updated successfully, but these errors were encountered:

rgommers · 2023-07-13T13:05:31Z

Both PyTorch and JAX support a kwarg for specifying the output size in order to avoid stream synchronization (PyTorch) and to allow compilation (JAX). Without such kwarg support, is this API viable? And what are the reasons for needing this kwarg when other array libraries (TensorFlow) omit such a kwarg?

I think because the repeats and axis keywords may not contain int literals, but be derived from other variables. At that point, they can't be known statically. EDIT: it looks like one more case of "output shape depends on value of an input parameter".

When flattening the input array, flatten in row-major order? (precedent: nonzero)

I think we should always do that. There are more places where flattening happens, and they should always be row-major I'd think.

Is PyTorch okay adding a repeat function in its main namespace, given the divergence in behavior for torch.Tensor.repeat, which behaves similar to np.tile?

Probably yes, but it's a relatively slow process to do so. The issue for it is pytorch/pytorch#50013, and in the tracker issue for numpy compat linked from there, it says:

torch.Tensor.repeat is actually np.tile, not np.repeat

    TODO: warn that users should use tile instead of repeat
    TODO (Release 2): remove torch.Tensor.repeat
    TODO (Release 3): implement torch.repeat to be compatible with np.repeat

Of course the longer that is not executed on, the less appetite there will be to do it later on.

rgommers · 2023-07-13T13:16:14Z

I think because the repeats and axis keywords may not contain int literals, but be derived from other variables. At that point, they can't be known statically. EDIT: it looks like one more case of "output shape depends on value of an input parameter".

Strangely enough, PyTorch and JAX don't have the same keyword for tile (xref gh-655).

rgommers · 2023-07-27T16:44:33Z

I looked at the PyTorch situation again; it looks a little annoying. torch.repeat could in principle be added now, with behavior compatible with np.repeat and not with the Tensor.repeat method. I think the equally annoying thing is that it's not supported as of now, so it's just more work and no one is lining up to do that work.

@lezcano do you have any thoughts on this one?

lezcano · 2023-07-27T17:27:46Z

Having a method and a free function that have different behaviour is not a good idea (if I understood the proposal correctly). The whole update process is also a non-go, has we'd need to do it in 3 releases, and the time between releases now has gone up (unofficially) to almost a year.

Now, this should not stop the array api from standarising this, as this function is pretty much repeate_interleave. One should just be careful to cast the tuple into a tensor and populate the output_size kwarg correctly.

nb. it doesn't make any sense that in PyTorch's repeat_interleave the repeats arg is a tensor and not a tuple...

oleksandr-pavlyk · 2023-09-06T20:13:53Z

I argue we should allow value of repeat to be an array as well (we should not insist it to be a tuple of ints).

Offloading libraries must convert this vector and transfer data to an accelerator during the implementation anyway. Supporting array input would unlock certain use-cases, such a programmatic generation of repeats (random sampling, procedural creation, etc.)

kgryte · 2023-09-07T18:01:31Z

@oleksandr-pavlyk I think this is a bug in the proposal. repeats should be allowed to be either an int or an array of ints.

vadimkantorov · 2023-09-10T15:08:42Z

Supporting arrays is also useful for cases like PyTorch where there might be improvements if the repeats/counts argument is already a tensor obtained from elsewhere and especially if it's already on the target device

lezcano · 2023-09-10T21:18:36Z

@vadimkantorov What do you mean by "there might be improvements"? There are no improvements, and it may even be detrimental to pass a tensor to this function if it's in CUDA.
If you think otherwise, it'd be useful to show concrete examples.

leofang · 2023-09-22T15:26:59Z

Hi all, this RFC (and PR #690) was briefly discussed in the Array API meeting yesterday. Apology to @kgryte @oleksandr-pavlyk @seberg, for comments I made in the meeting without fully understanding the sticking points here. Below is my updated analysis & response.

As already pointed out earlier this thread repeat() is challenging in that the returned shape dynamically depending on the inputs, unlike most APIs that have a predetermined contract for statical analysis. So this is still an undressed question (already documented in the RFC) so far.

Moreover, with regard to the repeats argument taking an array object, I made a comment yesterday that stream ordering should help. In theory this would be true, but it involves implementation details of an array library. Specifically, array metadata (shape/strides/...) are usually stored on the host (as in CuPy/PyTorch), and so in such cases a blocking D2H copy is unavoidable for these libraries to get this info on host and allocate an array accordingly, and preserving stream ordering is not sufficient.

However, again some changes in the implementation can help. For example, if the array library can store array metadata on CUDA unified, pinned or heterogeneous memory, preserving stream ordering alone is sufficient, and no blocking copy is needed. (Consider the following CUDA events are ordered: repeats created on device -> array allocated by zero-copy accessing repeats from host via stream- wait event & callback.) So it's not the end of the world, and array libraries can evolve to work around implementation hurdles.

API-wise speaking, there's no issues. The semantics is well-defined, pending the raw-major discussion which is a separate topic.

CuPy chose to not support taking arrays for repeats for the same reason (cupy/cupy#1918, cupy/cupy#3849), sorry I didn't answer it right yesterday. But, since it's generally not possible to guarantee (despite our best effort) asynchronousity for all APIs, I am not convinced we cannot change should the standard accepts it (cc: @kmaehashi). On the CuPy side, we can always document that a certain combination of arguments may sync the device.

It has been a longstanding, recurrent user request to support repeats taking an array (see the above CuPy issues for example; this RFC and even earlier this thread also gave similar requests). I believe for user ergonomics it is the right choice to do. I disagree with @lezcano that

There are no improvements

The UX improvement is large and also anticipated by many downstream developers and users. So, as long as we address the dynamic shape issue (I suspect we'd end up with adding a remark, similarly to what we did for the unique*() APIs), I would support it.

Since @jpivarski had kindly offered a partial workaround to CuPy, I'd like to have Jim to chime in. I'd also eager to have Jax developers involved, maybe @jakevdp can offer insight?

lezcano · 2023-09-22T15:32:21Z

What about libraries that create compiled code like JAX or torch.compile, or libraries that use cudagraphs. In this function, you cannot even give a bound on the size of the output without looking at repeats, so this function cannot be compiled, certainly not into a CUDA graph.

Regarding UX, my point is that if you have a repeats tensor, you can always do in PyTorch a tensor.tolist() and pass it to this function. This goes in line with the "it's better explicit than implicit".

leofang · 2023-09-22T15:34:42Z

I argue that belongs to the dynamic shape issue, which is a separate discussion that we can/should certainly have. But given the array APIs have been dealing with it, repeat() is not the first instance and should stick to the same design principles that we apply throughout the standard.

vadimkantorov · 2023-09-22T15:50:47Z

For these cases, it might also be good to allow the user to pass in out= or some other way to specify max allowable shape (maybe as was done with unique). Often in practice, the user can have set some upper bound

seberg · 2023-09-22T16:25:03Z

Thanks for digging into that @leofang, so the problem is the synchronization due to unknown size (and maybe also the fact that the array may be mutated).

From a NumPy perspective I have to say I find it very strange to ask the user to use .tolist() for/on a potentially (large) integer array, something that makes certain use-cases tricky/slow (to the point where I wonder how many usages in library contexts become awkward).

oleksandr-pavlyk · 2023-10-04T02:21:36Z

Nit: it should probably say "if repeats is an array, repeats.shape must broadcast to (prod(x.shape),)"

kgryte · 2023-10-19T00:04:46Z

Okay, I think we are getting to some semblance of consensus here. Namely,

repeat should support accepting an int, a sequence of ints, or an integer array. Accepting an array aligns with user expectations, is already supported by most array libraries (NumPy, TF, JAX, et al), and is a requested feature in other libraries lacking support for providing an array.
we should add a note in the specification (and in the documentation of array libraries supporting GPUs) that providing an array can result in potential performance degradation due to device synchronization when computing the output array size.

Are we okay with this path forward?

cc @seberg @leofang @lezcano @oleksandr-pavlyk

kgryte added API extension Adds new functions or objects to the API. topic: Manipulation Array manipulation and transformation. labels Jul 13, 2023

kgryte added this to the v2023 milestone Jul 13, 2023

lezcano mentioned this issue Sep 10, 2023

Unnecessary cuda synchronizations that we should remove in PyTorch pytorch/pytorch#108968

Open

6 tasks

kgryte mentioned this issue Sep 21, 2023

Add repeat to the specification #690

Merged

leofang mentioned this issue Sep 24, 2023

Support array objects as the repeats argument to cupy.repeat cupy/cupy#3849

Open

kgryte mentioned this issue Nov 6, 2023

Tracking issue for the 2023 revision of the array API specification #643

Closed

17 tasks

rgommers closed this as completed in #690 Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: add support for repeating each element of an array #654

RFC: add support for repeating each element of an array #654

kgryte commented Jul 13, 2023 •

edited

Loading

rgommers commented Jul 13, 2023 •

edited

Loading

rgommers commented Jul 13, 2023

rgommers commented Jul 27, 2023

lezcano commented Jul 27, 2023

oleksandr-pavlyk commented Sep 6, 2023

kgryte commented Sep 7, 2023

vadimkantorov commented Sep 10, 2023

lezcano commented Sep 10, 2023

leofang commented Sep 22, 2023 •

edited

Loading

lezcano commented Sep 22, 2023 •

edited

Loading

leofang commented Sep 22, 2023 •

edited

Loading

vadimkantorov commented Sep 22, 2023

seberg commented Sep 22, 2023

oleksandr-pavlyk commented Oct 4, 2023

kgryte commented Oct 19, 2023

RFC: add support for repeating each element of an array #654

RFC: add support for repeating each element of an array #654

Comments

kgryte commented Jul 13, 2023 • edited Loading

Overview

Prior art

Proposal

Questions

Related

rgommers commented Jul 13, 2023 • edited Loading

rgommers commented Jul 13, 2023

rgommers commented Jul 27, 2023

lezcano commented Jul 27, 2023

oleksandr-pavlyk commented Sep 6, 2023

kgryte commented Sep 7, 2023

vadimkantorov commented Sep 10, 2023

lezcano commented Sep 10, 2023

leofang commented Sep 22, 2023 • edited Loading

lezcano commented Sep 22, 2023 • edited Loading

leofang commented Sep 22, 2023 • edited Loading

vadimkantorov commented Sep 22, 2023

seberg commented Sep 22, 2023

oleksandr-pavlyk commented Oct 4, 2023

kgryte commented Oct 19, 2023

kgryte commented Jul 13, 2023 •

edited

Loading

rgommers commented Jul 13, 2023 •

edited

Loading

leofang commented Sep 22, 2023 •

edited

Loading

lezcano commented Sep 22, 2023 •

edited

Loading

leofang commented Sep 22, 2023 •

edited

Loading