Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write out the DMA buffer partially? #573

Open
vlovich opened this issue Oct 20, 2022 · 6 comments
Open

Write out the DMA buffer partially? #573

vlovich opened this issue Oct 20, 2022 · 6 comments

Comments

@vlovich
Copy link
Contributor

vlovich commented Oct 20, 2022

In normal I/O, if you wanted to make sure your data structure is aligned on some large boundary (e.g. 8 MiB) but you wanted to do smaller writes, you would simply write out how much data you have (aligned to 4kib boundaries for Direct I/O) and lseek to the next 8 MiB boundary.

I recognize that glommio can't do this today, but I wonder if such functionality could be added. One possible way I'm thinking this would work is that I allocate a 8 MiB DMA buffer but then if I fill it up only partially (e.g. 128 KiB), I can call .truncate on the DmaBuffer (which will requires that the truncated length still has valid alignment) or call a write_at_partially so that I don't write the entire buffer to disk.

@vlovich
Copy link
Contributor Author

vlovich commented Oct 22, 2022

Not sure if exposing trim_to_size is all that's needed although I don't know if that'll actually work (haven't tried it out).

@glommer
Copy link
Collaborator

glommer commented Oct 24, 2022

Hello @vlovich.

this should definitely work with some variation of write_at and filesystem flags to extend the size. Are you talking specifically about the stream api ?

@vlovich
Copy link
Contributor Author

vlovich commented Oct 24, 2022

No direct I/O via DmaFile. I'm managing my own buffers directly so the stream API isn't really useful for my purposes. Specifically, I have no problem creating holes with write_to. The problem is that I don't know what the actual size of the right will be at the time of allocation.

Basically I:
1 Allocate 8 MiB
2. User input fills up some portion of the 8 Mib
3. I write out the buffer

I want to change step 3 so that if the user only fills up 1 MiB, I only write out 1 MiB instead of 1 MiB of data & 7 MiB of zeroes. The next buffer would be written out at pos + 8 MiB so the kernel would do all the write things to create an interim hole (I've tested that part works but the missing piece is the ability to change the size of the write after step 2 since step 1 acquires an 8 MiB buffer).

@glommer
Copy link
Collaborator

glommer commented Oct 24, 2022

I think your best bet is a positional write followed (or preceeded) by fallocate or ftruncate.

There is no such thing as unallocated space in a file in general from the VFS PoV. Individual filesystems may have optimizations like that, but if a file has a certain size, the filesystem will commit blocks to it.

Whether or not they get zeroed is a different matter, but they usually are - otherwise you would just access bytes from another application that may have released the file.

I'd encourage you to take a look at both ftruncate and fallocate (glommio exposes both) and figure out which works best. fallocate has more specialized modes that may not zero, but they are full of caveats.

@vlovich
Copy link
Contributor Author

vlovich commented Oct 24, 2022

That doesn't actually help because you're still going to get write amplification to the flash. Most Linux filesystems (XFS, ext4, btrfs AFAIK) will all write a much smaller amount of data to record the hole which would significantly mitigate the amplification.

@vlovich
Copy link
Contributor Author

vlovich commented Oct 24, 2022

Also fallocate isn't exposed itself AFAICT. It's only exposed within the crate so that pre_allocate can invoke it.

pre_allocate itself has the surprising (wrong?) behavior that calling it on an existing file will end up erasing whatever is already in there which isn't what fallocate is supposed to do:

Any subregion within the range specified by offset and len that did not contain data before the call will be initialized to zero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants