-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Major Change] Enforcing tensor alignment #148
Conversation
would this change require changes to #44? or would it be transparent?
superior is ambiguous, I would add "(i.e., lower then)" |
Transparent, it's only making sure the header JSON is of the appropriate size by adding extra padding.
True! |
7808b6d
to
d39d645
Compare
- Now the header will automatically align itself to 8 bytes (f64) with appending extra spaces as necessary. - This will allow extra fast memory mapping by reinterpreting bytes as f32/f64 etc.. Unaligned bytes do not allow for this. https://www.reddit.com/r/rust/comments/tanaxm/mutating_a_buffer_of_u8s_as_f32s_in_place/ - This does not change contiguousness of tensors - This does not change the actual spec (we're just putting extra valid bytes in the header and using a different serialization ordering) - Readers should still be able to read old files, they would just need to be copied before being cast as their final destination when using mmap - This has no effect for GPU since copy is already necessary (*I think*, depends on the cuda API actually if it allows filling f32 addresses from raw unaligned bytes). This change will only be interesting if things like https://github.com/Narsil/fast_gpt2 actually pick up. And even with the copy, load times are still vastly superior to `pytorch`. We need to be able to read old files.
268e7f9
to
f5d27a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, what prompted the change from BTreeMap
to HashMap
?
complexity. O(log (n) ) vs O(1). Shouldn't matter in practice really, but since I don't rely on the structure for ordering anymore, I don't need BTree anymore. |
appending extra spaces as necessary.
f32/f64 etc.. Unaligned bytes do not allow for this. https://www.reddit.com/r/rust/comments/tanaxm/mutating_a_buffer_of_u8s_as_f32s_in_place/
in the header and using a different serialization ordering)
to be copied before being cast as their final destination when using
mmap
depends on the cuda API actually if it allows filling f32 addresses
from raw unaligned bytes).
This change will only be interesting if things like https://github.com/Narsil/fast_gpt2
actually pick up. And even with the copy, load times are still vastly
faster than
pytorch
/transformers
.