Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a note on alignment requirement of data pointer #83

Merged
merged 1 commit into from
Nov 9, 2021

Conversation

@tqchen tqchen merged commit 98861a5 into dmlc:main Nov 9, 2021
@rgommers rgommers deleted the note-on-alignment branch November 9, 2021 14:33
@seberg
Copy link
Contributor

seberg commented Nov 9, 2021

A few questions:

  • "allocation pointer" feels a bit unclear, but probably fine. Lets say we are on the CPU, when should I use byte_offset (or even must use?). If you create a view into another array (NumPy does not always have the original data allocation pointer), should NumPy use byte_offset if it can (but not must)?
  • If 256 applies to data is the alignment expectation of data + byte_offset?
  • NumPy only just now has functionality that could give us 256 byte aligned allocations. Do you consider all implementations which cannot guarantee the alignment buggy for the time being? Are importers OK if they rely on it, although the note here warns them that a lot of things are effectively buggy?

@tqchen
Copy link
Member

tqchen commented Nov 9, 2021

There is no alignment expectation of data+ byte_offset.
I don't think "allocation ptr" is a strict requirement, say if we get a ptr that is not aligned to 256, we can still make a representation that makes data points to the 256 aligned base ptr, and set the byte offset correctly.

The main original goal was to de-couple so that in common cases a check in byte_offset should be sufficient for aligned arrays(as a result default optimziations that folllows)

@seberg
Copy link
Contributor

seberg commented Nov 9, 2021

representation that makes data points to the 256 aligned base ptr, and set the byte offset correctly.

Right, although that pointer may not be accessible, so the importer cannot assume they can read the data "between" data and data + byte_offset (e.g. for faster vectorized access)? (As far as I understood Keith that could be relevant.)

we can still make a representation that makes data points to the 256 aligned base ptr, and set the byte offset correctly

Should NumPy do that, or must NumPy do that? (Assuming we live in a future where the added note does not apply anymore)? It is still not clear to me.

The main original goal was to de-couple so that in common cases a check in byte_offset should be sufficient for aligned arrays(as a result default optimziations that folllows)

Are you say that the importer must check if the data is aligned (no alignment guarantee at all) but may do so by checking only byte_offset itself (since data is assumed to be aligned to 256 bytes). Please write that in the spec then?
(I am not sure I understand the point, checking for alignment after calculating data + byte_offset, but maybe there is a point on none CPU devices)

One thing maybe: I would be perfectly happy if alignment guarantees differ for devices if that helps.

@seberg
Copy link
Contributor

seberg commented Nov 9, 2021

So are these right!?

  • data must be aligned to 256, but may not point inside the actual allocation area if necessary to achieve alignment. This means that byte_offset can be used when the data does not start aligned. (Need for checking: Is a 256 byte alignment guaranteed for any possible opaque pointer?! Because if it is not, I don't see any point for this guarantee.)
  • The importer must apply byte_offset for all devices. byte_offset may be any value (with respect to alignment). So the importer must check the alignment of byte_offset if it requires aligned data. However, the importer only needs to check byte_offset, since data may be assumed to be aligned.

Comment: Currently there are a lot of buggy implementations, so it may make sense to not rely too much on the data specifications being aligned.
EDIT: Or if you don't like this type of comment, specify that versions up to X.Y have less strict guarantees due to the spec being unclear previously and many implementations not adhering to this.

@seberg
Copy link
Contributor

seberg commented Nov 9, 2021

And sorry for being knee-jerk at thinking that adding examples of things that do not adher to the standard is not all that useful...

Thanks, this is much clearer now. However, I still feel it requires the reader to fill in the blanks. And to a reader interested in importing it may not even occur that unaligned data is a thing or that data could point before the actual data.

@rgommers
Copy link
Contributor Author

rgommers commented Nov 9, 2021

Right, although that pointer may not be accessible, so the importer cannot assume they can read the data "between" data and data + byte_offset (e.g. for faster vectorized access)? (As far as I understood Keith that could be relevant.)

I think this is always under control of the library, right? If initial allocation is aligned, then one needs to deal with unaligned data when new arrays are created as views. In that case, calculating the data pointer as (ptr_to_firstelement // 256) * 256 and byte_offset as ptr_to_firstelement % 256 should always be valid. My understanding was that this is what is specified by the alignment requirement. I can send a follow-up PR if you all agree.

Should NumPy do that, or must NumPy do that? (Assuming we live in a future where the added note does not apply anymore)? It is still not clear to me.

It's must, that is what This pointer is always aligned to 256 bytes says (if that is not clear, how about we change "is" to "must"?). The disclaimer below says "please don't enforce this just yet".

So are these right!?

It sounds right to me.

@kkraus14
Copy link

kkraus14 commented Nov 9, 2021

My 2c here: having the data pointer always be aligned to 256 bytes even if the allocation isn't guaranteed to be aligned to at least 256 bytes just causes extra work without any benefit. If I understand correctly, there's no guarantee that it's safe to read the bytes between data and data + byte_offset.

What benefit does this give us over just allowing data to not be 256 byte aligned (which a bunch of frameworks are already doing)?

@rgommers
Copy link
Contributor Author

rgommers commented Nov 9, 2021

And sorry for being knee-jerk at thinking that adding examples of things that do not adher to the standard is not all that useful...

I think it is useful, although it's a bit awkward to put it inside the header - the current comment is already too long. I'd like to write some html docs that have:

@rgommers
Copy link
Contributor Author

rgommers commented Nov 9, 2021

What benefit does this give us over just allowing data to not be 256 byte aligned (which a bunch of frameworks are already doing)?

Weren't you the one arguing in favor of having an aligned data pointer in data-apis/array-api#293 (comment) @kkraus14? I'm confused now. It's either this, or just giving up and having zero alignment guarantees I think.

@seberg
Copy link
Contributor

seberg commented Nov 9, 2021

If initial allocation is aligned, then one needs to deal with unaligned data when new arrays are created as views.

Yes, but byte_offset is unsigned. If the actual data starts unaligned, data would point before the allocation. Normally for an aligned allocator, data would point after the allocation. So I think it is a subtlety that would be good to make painfully clear. The point is that the range data to data + byte_offset may not be readable.

The disclaimer below says "please don't enforce this just yet".

Yes, OK. My frustration is that I want/am expected to merge dlpack into NumPy. That means I have to either:

  • Not follow the standard
  • Break pytorch importer (i.e. pytorch does not follow the standard: the comment does not matter it is a note for the importer not exporter and for numpy exporting is relevant)
  • In either case, NumPy will break importers who did not realize that unaligned data is possible: so I would like that to be more clear also. (This is probably not super relvant in practice, but still important IMO!)

I am quite certain we should just break pytorch, but until now nobody thought of enforcing the 256 byte aligned data pointer!

I think it is useful, although it's a bit awkward to put it inside the header

yes, it is super useful! But it does not help me with making a call on whether to break the standard or break pytorch (or importers not expecting unaligned data).

What benefit does this give us over just allowing data to not be 256 byte aligned

Agreed with @kkraus14, I do not understand the advantage, but I am happy to accept it :).

@rgommers
Copy link
Contributor Author

rgommers commented Nov 9, 2021

I am quite certain we should just break pytorch

No, I think we should do this: data-apis/array-api#293 (comment). We can get to a situation where everyone adheres to the standard without breaking anything for anyone (unless they use really old versions of a library). No matter how this discussion ends, it's about a future state - we can and should merge the NumPy PR with byte_offset = 0 now.

@kkraus14
Copy link

kkraus14 commented Nov 9, 2021

Weren't you the one arguing in favor of having an aligned data pointer in data-apis/array-api#293 (comment) @kkraus14? I'm confused now. It's either this, or just giving up and having zero alignment guarantees I think.

I was arguing in favor of having aligned allocations where it's then safe to read from data to data + byte_offset + size because you can then take much more optimized code paths. Without a guarantee that you can safely read between data and data + byte_offset it doesn't help.

@seberg
Copy link
Contributor

seberg commented Nov 9, 2021

@rgommers pytorch currently ignores byte_offset (or am I mistaken?). Thus, NumPy cannot give any alignment guarantees on data and also not break pytorch. So yes, not breaking pytorch means violating the current wording of the standard.

@rgommers
Copy link
Contributor Author

rgommers commented Nov 9, 2021

@rgommers pytorch currently ignores byte_offset (or am I mistaken?). Thus, NumPy cannot give any alignment guarantees on data and also not break pytorch. So yes, not breaking pytorch means violating the current wording of the standard.

Yes, you're not mistaken. As I said in the linked comment above, that is why we need gradual evolution. Start with byte_offset = 0, fix the consumer (from_dlpack) code in all libraries, and only then start using nonzero byte_offset.

@seberg
Copy link
Contributor

seberg commented Nov 9, 2021

Sorry, but IMO, if the current state is that NumPy must export with byte_offset=0 (in all relevant code paths at least), then the standard should say so. The comment is not clear about this: It says that exporters do not adhere to it, not that importers do not adhere.

The comment tells me: Please add extra checks on import. But not breaking pytorch tells me: do not follow the standards requirement for alignment of data until pytorch fixes its import!

@seberg
Copy link
Contributor

seberg commented Nov 9, 2021

OK, so: The way not adhere to the standard is just to produce byte_offset=0 for now (not in line with the header text!) and then switch at some future point. Sorry, but it is confusing that the way to follow the __dlpack__ standard is to explicitly break with the header, no?

I can live with that. I assume this also means NumPy should be happy to export unaligned data?

@rgommers
Copy link
Contributor Author

rgommers commented Nov 9, 2021

I was arguing in favor of having aligned allocations where it's then safe to read from data to data + byte_offset + size because you can then take much more optimized code paths. Without a guarantee that you can safely read between data and data + byte_offset it doesn't help.

It sounds like you are asking for optional alignment then. There's no place to put such information. I think it may be better to move that conversation back to data-apis/array-api#293, let me comment there.

@seberg
Copy link
Contributor

seberg commented Nov 9, 2021

OK, NumPy related only, but I pushed this then:

    /*
     * Note: the `dlpack.h` header suggests/standardizes that `data` must be
     * 256-byte aligned.  We ignore this intentionally, because `__dlpack__`
     * standardizes that `byte_offset` must be 0 (for now) to not break pytorch:
     * https://github.com/data-apis/array-api/issues/293#issuecomment-964111413
     *
     * We further assume that exporting fully unaligned data is OK even without
     * `byte_offset` since the standard does not reject it.
     * Presumably, pytorch will support importing `byte_offset != 0` and NumPy
     * can choose to use it starting about 2023.  At that point, it may be
     * that NumPy MUST use `byte_offset` to adhere to the standard (as
     * specified in the header)!
     */
    managed->dl_tensor.data = PyArray_DATA(self);
    managed->dl_tensor.byte_offset = 0;

@rgommers
Copy link
Contributor Author

rgommers commented Nov 9, 2021

OK, NumPy related only, but I pushed this then:

thank you, that seems good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants