Add a note on alignment requirement of data pointer #83

rgommers · 2021-11-09T12:33:41Z

xref discussion in data-apis/array-api#293

Code examples for current handling of the data and byte_offset
fields:

https://github.com/apache/tvm/blob/43c2ea72bcd9968531441189e172372c1d8a64cb/src/contrib/tf_op/tvm_dso_op_kernels.cc#L170

https://github.com/tensorflow/tensorflow/blob/0b6b491d21d6a4eb5fbab1cca565bc1e94ca9543/tensorflow/compiler/xla/python/dlpack.cc#L308

https://github.com/cupy/cupy/blob/1a1ac3a30930562580c61fb5f9ec44314c6a593b/cupy/_core/dlpack.pyx#L111

https://github.com/pytorch/pytorch/blob/4b9464f4b99750c31583e5e384bec4de927f588e/aten/src/ATen/DLConvertor.cpp#L236

Cc @leofang, @kkraus14, @seberg, @tqchen, @mattip

xref discussion in data-apis/array-api#293 Code examples for current handling of the `data` and `byte_offset` fields: https://github.com/apache/tvm/blob/43c2ea72bcd9968531441189e172372c1d8a64cb/src/contrib/tf_op/tvm_dso_op_kernels.cc#L170 https://github.com/tensorflow/tensorflow/blob/0b6b491d21d6a4eb5fbab1cca565bc1e94ca9543/tensorflow/compiler/xla/python/dlpack.cc#L308 https://github.com/cupy/cupy/blob/1a1ac3a30930562580c61fb5f9ec44314c6a593b/cupy/_core/dlpack.pyx#L111 https://github.com/pytorch/pytorch/blob/4b9464f4b99750c31583e5e384bec4de927f588e/aten/src/ATen/DLConvertor.cpp#L236

include/dlpack/dlpack.h

seberg · 2021-11-09T14:48:14Z

A few questions:

"allocation pointer" feels a bit unclear, but probably fine. Lets say we are on the CPU, when should I use byte_offset (or even must use?). If you create a view into another array (NumPy does not always have the original data allocation pointer), should NumPy use byte_offset if it can (but not must)?
If 256 applies to data is the alignment expectation of data + byte_offset?
NumPy only just now has functionality that could give us 256 byte aligned allocations. Do you consider all implementations which cannot guarantee the alignment buggy for the time being? Are importers OK if they rely on it, although the note here warns them that a lot of things are effectively buggy?

tqchen · 2021-11-09T14:59:14Z

There is no alignment expectation of data+ byte_offset.
I don't think "allocation ptr" is a strict requirement, say if we get a ptr that is not aligned to 256, we can still make a representation that makes data points to the 256 aligned base ptr, and set the byte offset correctly.

The main original goal was to de-couple so that in common cases a check in byte_offset should be sufficient for aligned arrays(as a result default optimziations that folllows)

seberg · 2021-11-09T15:06:59Z

representation that makes data points to the 256 aligned base ptr, and set the byte offset correctly.

Right, although that pointer may not be accessible, so the importer cannot assume they can read the data "between" data and data + byte_offset (e.g. for faster vectorized access)? (As far as I understood Keith that could be relevant.)

we can still make a representation that makes data points to the 256 aligned base ptr, and set the byte offset correctly

Should NumPy do that, or must NumPy do that? (Assuming we live in a future where the added note does not apply anymore)? It is still not clear to me.

The main original goal was to de-couple so that in common cases a check in byte_offset should be sufficient for aligned arrays(as a result default optimziations that folllows)

Are you say that the importer must check if the data is aligned (no alignment guarantee at all) but may do so by checking only byte_offset itself (since data is assumed to be aligned to 256 bytes). Please write that in the spec then?
(I am not sure I understand the point, checking for alignment after calculating data + byte_offset, but maybe there is a point on none CPU devices)

One thing maybe: I would be perfectly happy if alignment guarantees differ for devices if that helps.

seberg · 2021-11-09T15:42:23Z

So are these right!?

data must be aligned to 256, but may not point inside the actual allocation area if necessary to achieve alignment. This means that byte_offset can be used when the data does not start aligned. (Need for checking: Is a 256 byte alignment guaranteed for any possible opaque pointer?! Because if it is not, I don't see any point for this guarantee.)
The importer must apply byte_offset for all devices. byte_offset may be any value (with respect to alignment). So the importer must check the alignment of byte_offset if it requires aligned data. However, the importer only needs to check byte_offset, since data may be assumed to be aligned.

Comment: Currently there are a lot of buggy implementations, so it may make sense to not rely too much on the data specifications being aligned.
EDIT: Or if you don't like this type of comment, specify that versions up to X.Y have less strict guarantees due to the spec being unclear previously and many implementations not adhering to this.

seberg · 2021-11-09T16:27:03Z

And sorry for being knee-jerk at thinking that adding examples of things that do not adher to the standard is not all that useful...

Thanks, this is much clearer now. However, I still feel it requires the reader to fill in the blanks. And to a reader interested in importing it may not even occur that unaligned data is a thing or that data could point before the actual data.

rgommers · 2021-11-09T17:17:26Z

Right, although that pointer may not be accessible, so the importer cannot assume they can read the data "between" data and data + byte_offset (e.g. for faster vectorized access)? (As far as I understood Keith that could be relevant.)

I think this is always under control of the library, right? If initial allocation is aligned, then one needs to deal with unaligned data when new arrays are created as views. In that case, calculating the data pointer as (ptr_to_firstelement // 256) * 256 and byte_offset as ptr_to_firstelement % 256 should always be valid. My understanding was that this is what is specified by the alignment requirement. I can send a follow-up PR if you all agree.

Should NumPy do that, or must NumPy do that? (Assuming we live in a future where the added note does not apply anymore)? It is still not clear to me.

It's must, that is what This pointer is always aligned to 256 bytes says (if that is not clear, how about we change "is" to "must"?). The disclaimer below says "please don't enforce this just yet".

So are these right!?

It sounds right to me.

kkraus14 · 2021-11-09T17:25:16Z

My 2c here: having the data pointer always be aligned to 256 bytes even if the allocation isn't guaranteed to be aligned to at least 256 bytes just causes extra work without any benefit. If I understand correctly, there's no guarantee that it's safe to read the bytes between data and data + byte_offset.

What benefit does this give us over just allowing data to not be 256 byte aligned (which a bunch of frameworks are already doing)?

rgommers · 2021-11-09T17:25:30Z

And sorry for being knee-jerk at thinking that adding examples of things that do not adher to the standard is not all that useful...

I think it is useful, although it's a bit awkward to put it inside the header - the current comment is already too long. I'd like to write some html docs that have:

A list of adopters, and links to their implementations.
Example implementer logic like this alignment checking (if an implementer wants to do that - it's not up to a specification to say whether that must be done or not)
Example usage
Most of what's now described under https://data-apis.org/array-api/latest/design_topics/data_interchange.html#dlpack-support

rgommers · 2021-11-09T17:28:15Z

What benefit does this give us over just allowing data to not be 256 byte aligned (which a bunch of frameworks are already doing)?

Weren't you the one arguing in favor of having an aligned data pointer in data-apis/array-api#293 (comment) @kkraus14? I'm confused now. It's either this, or just giving up and having zero alignment guarantees I think.

seberg · 2021-11-09T17:31:11Z

If initial allocation is aligned, then one needs to deal with unaligned data when new arrays are created as views.

Yes, but byte_offset is unsigned. If the actual data starts unaligned, data would point before the allocation. Normally for an aligned allocator, data would point after the allocation. So I think it is a subtlety that would be good to make painfully clear. The point is that the range data to data + byte_offset may not be readable.

The disclaimer below says "please don't enforce this just yet".

Yes, OK. My frustration is that I want/am expected to merge dlpack into NumPy. That means I have to either:

Not follow the standard
Break pytorch importer (i.e. pytorch does not follow the standard: the comment does not matter it is a note for the importer not exporter and for numpy exporting is relevant)
In either case, NumPy will break importers who did not realize that unaligned data is possible: so I would like that to be more clear also. (This is probably not super relvant in practice, but still important IMO!)

I am quite certain we should just break pytorch, but until now nobody thought of enforcing the 256 byte aligned data pointer!

I think it is useful, although it's a bit awkward to put it inside the header

yes, it is super useful! But it does not help me with making a call on whether to break the standard or break pytorch (or importers not expecting unaligned data).

What benefit does this give us over just allowing data to not be 256 byte aligned

Agreed with @kkraus14, I do not understand the advantage, but I am happy to accept it :).

rgommers · 2021-11-09T17:38:43Z

I am quite certain we should just break pytorch

No, I think we should do this: data-apis/array-api#293 (comment). We can get to a situation where everyone adheres to the standard without breaking anything for anyone (unless they use really old versions of a library). No matter how this discussion ends, it's about a future state - we can and should merge the NumPy PR with byte_offset = 0 now.

kkraus14 · 2021-11-09T17:41:08Z

Weren't you the one arguing in favor of having an aligned data pointer in data-apis/array-api#293 (comment) @kkraus14? I'm confused now. It's either this, or just giving up and having zero alignment guarantees I think.

I was arguing in favor of having aligned allocations where it's then safe to read from data to data + byte_offset + size because you can then take much more optimized code paths. Without a guarantee that you can safely read between data and data + byte_offset it doesn't help.

seberg · 2021-11-09T17:42:23Z

@rgommers pytorch currently ignores byte_offset (or am I mistaken?). Thus, NumPy cannot give any alignment guarantees on data and also not break pytorch. So yes, not breaking pytorch means violating the current wording of the standard.

rgommers · 2021-11-09T17:45:58Z

@rgommers pytorch currently ignores byte_offset (or am I mistaken?). Thus, NumPy cannot give any alignment guarantees on data and also not break pytorch. So yes, not breaking pytorch means violating the current wording of the standard.

Yes, you're not mistaken. As I said in the linked comment above, that is why we need gradual evolution. Start with byte_offset = 0, fix the consumer (from_dlpack) code in all libraries, and only then start using nonzero byte_offset.

seberg · 2021-11-09T17:47:19Z

Sorry, but IMO, if the current state is that NumPy must export with byte_offset=0 (in all relevant code paths at least), then the standard should say so. The comment is not clear about this: It says that exporters do not adhere to it, not that importers do not adhere.

The comment tells me: Please add extra checks on import. But not breaking pytorch tells me: do not follow the standards requirement for alignment of data until pytorch fixes its import!

seberg · 2021-11-09T17:54:35Z

OK, so: The way not adhere to the standard is just to produce byte_offset=0 for now (not in line with the header text!) and then switch at some future point. Sorry, but it is confusing that the way to follow the __dlpack__ standard is to explicitly break with the header, no?

I can live with that. I assume this also means NumPy should be happy to export unaligned data?

rgommers · 2021-11-09T18:01:52Z

I was arguing in favor of having aligned allocations where it's then safe to read from data to data + byte_offset + size because you can then take much more optimized code paths. Without a guarantee that you can safely read between data and data + byte_offset it doesn't help.

It sounds like you are asking for optional alignment then. There's no place to put such information. I think it may be better to move that conversation back to data-apis/array-api#293, let me comment there.

seberg · 2021-11-09T18:17:44Z

OK, NumPy related only, but I pushed this then:

    /*
     * Note: the `dlpack.h` header suggests/standardizes that `data` must be
     * 256-byte aligned.  We ignore this intentionally, because `__dlpack__`
     * standardizes that `byte_offset` must be 0 (for now) to not break pytorch:
     * https://github.com/data-apis/array-api/issues/293#issuecomment-964111413
     *
     * We further assume that exporting fully unaligned data is OK even without
     * `byte_offset` since the standard does not reject it.
     * Presumably, pytorch will support importing `byte_offset != 0` and NumPy
     * can choose to use it starting about 2023.  At that point, it may be
     * that NumPy MUST use `byte_offset` to adhere to the standard (as
     * specified in the header)!
     */
    managed->dl_tensor.data = PyArray_DATA(self);
    managed->dl_tensor.byte_offset = 0;

rgommers · 2021-11-09T18:21:29Z

OK, NumPy related only, but I pushed this then:

thank you, that seems good to me.

rgommers mentioned this pull request Nov 9, 2021

Discussion: DLPack requires 256B alignment of data pointers data-apis/array-api#293

Closed

tqchen reviewed Nov 9, 2021

View reviewed changes

include/dlpack/dlpack.h Show resolved Hide resolved

tqchen approved these changes Nov 9, 2021

View reviewed changes

tqchen merged commit 98861a5 into dmlc:main Nov 9, 2021

rgommers deleted the note-on-alignment branch November 9, 2021 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a note on alignment requirement of data pointer #83

Add a note on alignment requirement of data pointer #83

rgommers commented Nov 9, 2021 •

edited

Loading

seberg commented Nov 9, 2021

tqchen commented Nov 9, 2021

seberg commented Nov 9, 2021

seberg commented Nov 9, 2021 •

edited

Loading

seberg commented Nov 9, 2021

rgommers commented Nov 9, 2021

kkraus14 commented Nov 9, 2021

rgommers commented Nov 9, 2021

rgommers commented Nov 9, 2021

seberg commented Nov 9, 2021

rgommers commented Nov 9, 2021

kkraus14 commented Nov 9, 2021

seberg commented Nov 9, 2021

rgommers commented Nov 9, 2021

seberg commented Nov 9, 2021

seberg commented Nov 9, 2021

rgommers commented Nov 9, 2021

seberg commented Nov 9, 2021

rgommers commented Nov 9, 2021

Add a note on alignment requirement of data pointer #83

Add a note on alignment requirement of data pointer #83

Conversation

rgommers commented Nov 9, 2021 • edited Loading

seberg commented Nov 9, 2021

tqchen commented Nov 9, 2021

seberg commented Nov 9, 2021

seberg commented Nov 9, 2021 • edited Loading

seberg commented Nov 9, 2021

rgommers commented Nov 9, 2021

kkraus14 commented Nov 9, 2021

rgommers commented Nov 9, 2021

rgommers commented Nov 9, 2021

seberg commented Nov 9, 2021

rgommers commented Nov 9, 2021

kkraus14 commented Nov 9, 2021

seberg commented Nov 9, 2021

rgommers commented Nov 9, 2021

seberg commented Nov 9, 2021

seberg commented Nov 9, 2021

rgommers commented Nov 9, 2021

seberg commented Nov 9, 2021

rgommers commented Nov 9, 2021

rgommers commented Nov 9, 2021 •

edited

Loading

seberg commented Nov 9, 2021 •

edited

Loading