Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Major Change] Enforcing tensor alignment #148

Merged
merged 4 commits into from
Feb 21, 2023
Merged

[Major Change] Enforcing tensor alignment #148

merged 4 commits into from
Feb 21, 2023

Conversation

Narsil
Copy link
Collaborator

@Narsil Narsil commented Jan 4, 2023

  • Now the header will automatically align itself to 8 bytes (f64) with
    appending extra spaces as necessary.
  • This will allow extra fast memory mapping by reinterpreting bytes as
    f32/f64 etc.. Unaligned bytes do not allow for this. https://www.reddit.com/r/rust/comments/tanaxm/mutating_a_buffer_of_u8s_as_f32s_in_place/
  • This does not change contiguousness of tensors
  • This does not change the actual spec (we're just putting extra valid bytes
    in the header and using a different serialization ordering)
  • Readers should still be able to read old files, they would just need
    to be copied before being cast as their final destination when using
    mmap
  • This has no effect for GPU since copy is already necessary (I think,
    depends on the cuda API actually if it allows filling f32 addresses
    from raw unaligned bytes).

This change will only be interesting if things like https://github.com/Narsil/fast_gpt2
actually pick up. And even with the copy, load times are still vastly
faster than pytorch/transformers.

@Narsil Narsil marked this pull request as draft January 4, 2023 16:39
@julien-c
Copy link
Member

julien-c commented Jan 4, 2023

would this change require changes to #44? or would it be transparent?

And even with the copy, load times are still vastly
superior to pytorch/transformers.

superior is ambiguous, I would add "(i.e., lower then)"

@Narsil
Copy link
Collaborator Author

Narsil commented Jan 4, 2023

would this change require changes to #44? or would it be transparent?

Transparent, it's only making sure the header JSON is of the appropriate size by adding extra padding.
(Also tensor order is modified, which readers should be agnostic to anyway).

And even with the copy, load times are still vastly
superior to pytorch/transformers.

superior is ambiguous, I would add "(i.e., lower then)"

True!

- Now the header will automatically align itself to 8 bytes (f64) with
  appending extra spaces as necessary.
- This will allow extra fast memory mapping by reinterpreting bytes as
  f32/f64 etc.. Unaligned bytes do not allow for this. https://www.reddit.com/r/rust/comments/tanaxm/mutating_a_buffer_of_u8s_as_f32s_in_place/
- This does not change contiguousness of tensors
- This does not change the actual spec (we're just putting extra valid bytes
  in the header and using a different serialization ordering)
- Readers should still be able to read old files, they would just need
  to be copied before being cast as their final destination when using
  mmap
- This has no effect for GPU since copy is already necessary (*I think*,
  depends on the cuda API actually if it allows filling f32 addresses
  from raw unaligned bytes).

This change will only be interesting if things like https://github.com/Narsil/fast_gpt2
actually pick up. And even with the copy, load times are still vastly
superior to `pytorch`.

We need to be able to read old files.
@Narsil Narsil marked this pull request as ready for review February 8, 2023 11:42
Copy link
Member

@McPatate McPatate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, what prompted the change from BTreeMap to HashMap ?

@Narsil
Copy link
Collaborator Author

Narsil commented Feb 20, 2023

lgtm, what prompted the change from BTreeMap to HashMap ?

complexity. O(log (n) ) vs O(1). Shouldn't matter in practice really, but since I don't rely on the structure for ordering anymore, I don't need BTree anymore.

@Narsil Narsil merged commit 0c5b3a6 into main Feb 21, 2023
@Narsil Narsil deleted the forcing_alignment branch February 21, 2023 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants