Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specialize io::copy to use copy_file_range, splice or sendfile #75272

Merged
merged 13 commits into from
Nov 14, 2020

Conversation

the8472
Copy link
Member

@the8472 the8472 commented Aug 7, 2020

Fixes #74426.
Also covers #60689 but only as an optimization instead of an official API.

The specialization only covers std-owned structs so it should avoid the problems with #71091

Currently linux-only but it should be generalizable to other unix systems that have sendfile/sosplice and similar.

There is a bit of optimization potential around the syscall count. Right now it may end up doing more syscalls than the naive copy loop when doing short (<8KiB) copies between file descriptors.

The test case executes the following:

[pid 103776] statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=17, ...}) = 0
[pid 103776] write(4, "wxyz", 4)        = 4
[pid 103776] write(4, "iklmn", 5)       = 5
[pid 103776] copy_file_range(3, NULL, 4, NULL, 5, 0) = 5

0-1 stat calls to identify the source file type. 0 if the type can be inferred from the struct from which the FD was extracted
𝖬 write to drain the BufReader/BufWriter wrappers. only happen when buffers are present. 𝖬 ≾ number of wrappers present. If there is a write buffer it may absorb the read buffer contents first so only result in a single write. Vectored writes would also be an option but that would require more invasive changes to BufWriter.
𝖭 copy_file_range/splice/sendfile until file size, EOF or the byte limit from Take is reached. This should generally be much more efficient than the read-write loop and also have other benefits such as DMA offload or extent sharing.

Benchmarks


OLD

test io::tests::bench_file_to_file_copy         ... bench:      21,002 ns/iter (+/- 750) = 6240 MB/s    [ext4]
test io::tests::bench_file_to_file_copy         ... bench:      35,704 ns/iter (+/- 1,108) = 3671 MB/s  [btrfs]
test io::tests::bench_file_to_socket_copy       ... bench:      57,002 ns/iter (+/- 4,205) = 2299 MB/s
test io::tests::bench_socket_pipe_socket_copy   ... bench:     142,640 ns/iter (+/- 77,851) = 918 MB/s

NEW

test io::tests::bench_file_to_file_copy         ... bench:      14,745 ns/iter (+/- 519) = 8889 MB/s    [ext4]
test io::tests::bench_file_to_file_copy         ... bench:       6,128 ns/iter (+/- 227) = 21389 MB/s   [btrfs]
test io::tests::bench_file_to_socket_copy       ... bench:      13,767 ns/iter (+/- 3,767) = 9520 MB/s
test io::tests::bench_socket_pipe_socket_copy   ... bench:      26,471 ns/iter (+/- 6,412) = 4951 MB/s

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @KodrAus (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 7, 2020
@the8472 the8472 force-pushed the spec-copy branch 2 times, most recently from c9d1e81 to ac58849 Compare August 8, 2020 00:51
fn copy_specialization() -> Result<()> {
let path = crate::env::temp_dir();
let source_path = path.join("copy-spec.source");
let sink_path = path.join("copy-spec.sink");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does std have some policy about tests polluting the filesystem? there doesn't seem to be any utility for temporary files and libc doesn't even have a wrapper for memfd_create :(

@bors
Copy link
Contributor

bors commented Aug 19, 2020

☔ The latest upstream changes (presumably #75715) made this pull request unmergeable. Please resolve the merge conflicts.

@the8472 the8472 force-pushed the spec-copy branch 2 times, most recently from 8258196 to f867287 Compare August 20, 2020 08:33
@bors
Copy link
Contributor

bors commented Sep 1, 2020

☔ The latest upstream changes (presumably #76047) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Contributor

bors commented Sep 3, 2020

☔ The latest upstream changes (presumably #76265) made this pull request unmergeable. Please resolve the merge conflicts.

@KodrAus
Copy link
Contributor

KodrAus commented Sep 4, 2020

Thanks for your patience on this so far @the8472! Since this is quite a large PR it'll take me a while to dig through it all.

It's tricky to benchmark IO code, but I would be interested to see roughly what the impact of this is so we can justify the new machinery a bit.

@the8472
Copy link
Member Author

the8472 commented Sep 4, 2020

@KodrAus it's not just about speed (but that ought to be a benefit too, sure) but also about using copy_file_range which reduces storage footprint on copy-on-write filesystems.

But I'll try to come up with a benchmark.

That said, I think #75428, being a bugfix, has a higher priority than this.

@bors
Copy link
Contributor

bors commented Sep 5, 2020

☔ The latest upstream changes (presumably #75428) made this pull request unmergeable. Please resolve the merge conflicts.

@the8472
Copy link
Member Author

the8472 commented Sep 5, 2020

@KodrAus

Edit: posted more recent benchmarks in initial comment

All benchmarks copy 128KiB per bencher iteration. Larger files will probably achieve higher speeds.

While benchmarking I have noticed that results involving pipes can be quite erratic since currently the only form of pipes exposed in the public API is for child process stdio, which means the performance characteristics depend on the process on the other end.
I have used yes and dd to get the loop of syscalls that I want, but even minor changes can mean one process overtakes another which leads to blocking IO calls suddenly showing drastically different performance. The generic io::copy loop uses 1 read and 1 write syscall with an 8k buffer each. While that is in principle slower it can occasionally show higher throughput in benchmarks because splice finishes so fast that the source/sink becomes empty/full and the process gets suspended and incurs additional context switch costs while the constant switching between read/write means the process never gets suspended because it is the limiting factor.
sendfile from file to pipe works on modern kernels and doesn't seem to suffer from this erratic behavior because it stays in kernel space and just waits when the pipe is full, thus avoiding the context switch overhead.

So I'll go ahead and change the behavior to do syscall bashing to figure out which one works on a pair of file descriptors, with this preference ranking: copy_file_range > sendfile > splice > read+write
Modern kernels should then mostly succeed with the first two which provide the most benefits while older kernels will hit the fallbacks more frequently.

@the8472
Copy link
Member Author

the8472 commented Sep 10, 2020

@KodrAus updated and shaved off some syscalls. You can find benchmarks on top.

@KodrAus
Copy link
Contributor

KodrAus commented Sep 11, 2020

Thanks @the8472! I'll give this a thorough review now 👀

@KodrAus
Copy link
Contributor

KodrAus commented Sep 30, 2020

Thanks for your patience on this one @the8472! I just had to confirm that we're ok landing features that depend on specialization again (we had a block on it for a while, but with min_specialization things are ok again)

@the8472
Copy link
Member Author

the8472 commented Oct 1, 2020

Ok, but if you had mentioned that I could have pointed you to other specialization PRs that have landed since. And I am still anticipating the review itself.

Copy link
Contributor

@KodrAus KodrAus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @the8472!

I've left a few comments to start. I think we should refactor this into the sys module, to also avoid having to leak the CopyResult types that really only make sense for this specific implementation of io::copy.

I haven't dug into the specialized implementations just yet, or the changes to unix::fs.

library/std/src/io/copy.rs Outdated Show resolved Hide resolved
library/std/src/io/copy.rs Outdated Show resolved Hide resolved
library/std/src/io/copy.rs Outdated Show resolved Hide resolved
library/std/src/io/copy.rs Outdated Show resolved Hide resolved

fn fd_to_meta<T: AsRawFd>(fd: &T) -> FdMeta {
let fd = fd.as_raw_fd();
let file: ManuallyDrop<File> = ManuallyDrop::new(unsafe { File::from_raw_fd(fd) });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate we don't seem to have a better way to get the metadata for a file descriptor.

Is it actually always valid for us to convert an arbitrary file descriptor into a File?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a nicer way to call fstat/statx. It's equivalent to but more efficient and reliable than File::open("/proc/self/fd/<fd>")?.metadata()?

library/std/src/io/stdio.rs Outdated Show resolved Hide resolved
library/std/src/io/tests.rs Outdated Show resolved Hide resolved
match result {
CopyResult::Ended(Ok(bytes_copied)) => return Ok(bytes_copied + written),
CopyResult::Ended(err) => return err,
CopyResult::Fallback(bytes) => written += bytes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected for us to fallthrough to the next if here and then to the generic copy at the end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why the enum valuse is called Fallback. If that is not obvious it might need a better name.

match result {
CopyResult::Ended(Ok(bytes_copied)) => return Ok(bytes_copied + written),
CopyResult::Ended(err) => return err,
CopyResult::Fallback(bytes) => written += bytes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected for us to fallthrough to the next if here and then to the generic copy at the end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

library/std/src/io/copy.rs Outdated Show resolved Hide resolved
@the8472 the8472 force-pushed the spec-copy branch 2 times, most recently from d2bea7b to a7e6535 Compare October 7, 2020 22:01
@the8472
Copy link
Member Author

the8472 commented Oct 7, 2020

Should be ready for another review pass.

Copy link
Contributor

@KodrAus KodrAus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I've done another pass over the code. I think the implementation is good, but I think we should try limit the visibility of the internals needed by the specialized copy as much as possible. Ideally, I think the module structure could look something like this:

pub mod sys {
    pub mod unix {
        mod copy {
            #[cfg(any(target_os = "linux", target_os = "android"))]
            pub fn copy_fs(from: &Path, to: &Path) -> io::Result<u64> {}

            pub fn copy<R, W>(reader: R, writer: W) -> io::Result<usize> {}
        }

        pub mod fs {
            #[cfg(any(target_os = "linux", target_os = "android"))]
            pub use super::copy::copy_fs as copy;
        }

        pub use self::copy::copy;
    }
}

pub mod io {
    mod copy {
        pub fn copy<R, W>(reader: R, writer: W) -> io::Result<usize> {
            cfg_if::cfg_if! {
                if #[cfg(unix)] {
                    crate::sys::unix::copy(reader, writer)
                } else {
                    copy_fallback(reader, writer)
                }
            }
        }

        pub(crate) fn copy_fallback<R, W>(reader: R, writer: W) -> io::Result<usize> {}
    }

    pub use self::copy::copy;
}

That way none of the CopyResult or other internal enums need to be visible in the rest of the crate. What do you think?

@@ -315,6 +315,7 @@
#![feature(toowned_clone_into)]
#![feature(total_cmp)]
#![feature(trace_macros)]
#![feature(try_blocks)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised this wasn't here already... As far as I know the feature is somewhat blocked, but it looks like we do use it elsewhere in the repo so I'm ok with adding it here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only used it for cleanup in tests. But that was before I learned that other IO tests don't clean up the files they create either. So removing it and adding to the clutter instead would also be an option.

library/std/src/sys/unix/fs.rs Outdated Show resolved Hide resolved

/// The general read-write-loop implementation of
/// `io::copy` that is used when specializations are not available or not applicable.
pub(crate) fn generic_copy<R: ?Sized, W: ?Sized>(reader: &mut R, writer: &mut W) -> io::Result<u64>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more descriptive name for this might be copy_fallback since we only use it when specialized implementations aren't available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if fallback is the right word. From the perspective of the specializations it indeed is the fallback. But the specializations only apply in some narrow scenarios. In the more general case of arbitrary Read/Write pairs this is the default strategy.

library/std/src/sys/unix/fs.rs Outdated Show resolved Hide resolved
@the8472
Copy link
Member Author

the8472 commented Oct 12, 2020

@KodrAus

I think the module structure could look something like this: [...]

The issue is that there are more implementations of copy(from: &Path, to: &Path) in the unix::fs module. E.g. a macos one. Should that also be moved into the new unix::copy module?

ddiss added a commit to ddiss/dracut that referenced this pull request Aug 27, 2021
dracut-cpio is a minimal cpio archive creation utility written in Rust.
It provides support for a minimal set of features needed to create
performant and space-efficient initramfs archives:
- "newc" archive format only
- reproducible; inode numbers, uid/gid and mtime can be explicitly set
- data segment copy-on-write reflinks
  + using Rust io::copy()'s native copy_file_range() support[1]
  + optional archive data segment alignment for optimal reflink use[2]
- hardlink support
- comprehensive tests asserting GNU cpio binary output compatibility

1. Rust io::copy() copy_file_range()
   rust-lang/rust#75272

2. Data segment alignment
   We're bending the newc spec a bit to inject zeros after the file path
   to provide data segment alignment. These zeros are accounted for in
   the namesize, but some applications may only expect a single
   zero-terminator (and 4 byte alignment). GNU cpio and Linux initramfs
   handle this fine as long as PATH_MAX isn't exceeded.

Signed-off-by: David Disseldorp <[email protected]>
ddiss added a commit to ddiss/dracut that referenced this pull request Aug 27, 2021
dracut-cpio is a minimal cpio archive creation utility written in Rust.
It provides support for a minimal set of features needed to create
performant and space-efficient initramfs archives:
- "newc" archive format only
- reproducible; inode numbers, uid/gid and mtime can be explicitly set
- data segment copy-on-write reflinks
  + using Rust io::copy()'s native copy_file_range() support[1]
  + optional archive data segment alignment for optimal reflink use[2]
- hardlink support
- comprehensive tests asserting GNU cpio binary output compatibility

1. Rust io::copy() copy_file_range()
   rust-lang/rust#75272

2. Data segment alignment
   We're bending the newc spec a bit to inject zeros after the file path
   to provide data segment alignment. These zeros are accounted for in
   the namesize, but some applications may only expect a single
   zero-terminator (and 4 byte alignment). GNU cpio and Linux initramfs
   handle this fine as long as PATH_MAX isn't exceeded.

Signed-off-by: David Disseldorp <[email protected]>
ddiss added a commit to ddiss/dracut that referenced this pull request Sep 17, 2021
dracut-cpio is a minimal cpio archive creation utility written in Rust.
It provides support for a minimal set of features needed to create
performant and space-efficient initramfs archives:
- "newc" archive format only
- reproducible; inode numbers, uid/gid and mtime can be explicitly set
- data segment copy-on-write reflinks
  + using Rust io::copy()'s native copy_file_range() support[1]
  + optional archive data segment alignment for optimal reflink use[2]
- hardlink support
- comprehensive tests asserting GNU cpio binary output compatibility

1. Rust io::copy() copy_file_range()
   rust-lang/rust#75272

2. Data segment alignment
   We're bending the newc spec a bit to inject zeros after the file path
   to provide data segment alignment. These zeros are accounted for in
   the namesize, but some applications may only expect a single
   zero-terminator (and 4 byte alignment). GNU cpio and Linux initramfs
   handle this fine as long as PATH_MAX isn't exceeded.

Signed-off-by: David Disseldorp <[email protected]>
ddiss added a commit to ddiss/dracut that referenced this pull request Sep 26, 2021
dracut-cpio is a minimal cpio archive creation utility written in Rust.
It provides support for a minimal set of features needed to create
performant and space-efficient initramfs archives:
- "newc" archive format only
- reproducible; inode numbers, uid/gid and mtime can be explicitly set
- data segment copy-on-write reflinks
  + using Rust io::copy()'s native copy_file_range() support[1]
  + optional archive data segment alignment for optimal reflink use[2]
- hardlink support
- comprehensive tests asserting GNU cpio binary output compatibility

1. Rust io::copy() copy_file_range()
   rust-lang/rust#75272

2. Data segment alignment
   We're bending the newc spec a bit to inject zeros after the file path
   to provide data segment alignment. These zeros are accounted for in
   the namesize, but some applications may only expect a single
   zero-terminator (and 4 byte alignment). GNU cpio and Linux initramfs
   handle this fine as long as PATH_MAX isn't exceeded.

Signed-off-by: David Disseldorp <[email protected]>
(cherry picked from commit 300e4b1)
@the8472 the8472 mentioned this pull request Oct 12, 2021
haraldh pushed a commit to dracutdevs/dracut that referenced this pull request Nov 24, 2021
dracut-cpio is a minimal cpio archive creation utility written in Rust.
It provides support for a minimal set of features needed to create
performant and space-efficient initramfs archives:
- "newc" archive format only
- reproducible; inode numbers, uid/gid and mtime can be explicitly set
- data segment copy-on-write reflinks
  + using Rust io::copy()'s native copy_file_range() support[1]
  + optional archive data segment alignment for optimal reflink use[2]
- hardlink support
- comprehensive tests asserting GNU cpio binary output compatibility

1. Rust io::copy() copy_file_range()
   rust-lang/rust#75272

2. Data segment alignment
   We're bending the newc spec a bit to inject zeros after the file path
   to provide data segment alignment. These zeros are accounted for in
   the namesize, but some applications may only expect a single
   zero-terminator (and 4 byte alignment). GNU cpio and Linux initramfs
   handle this fine as long as PATH_MAX isn't exceeded.

Signed-off-by: David Disseldorp <[email protected]>
ddiss added a commit to ddiss/dracut that referenced this pull request Dec 6, 2021
dracut-cpio is a minimal cpio archive creation utility written in Rust.
It provides support for a minimal set of features needed to create
performant and space-efficient initramfs archives:
- "newc" archive format only
- reproducible; inode numbers, uid/gid and mtime can be explicitly set
- data segment copy-on-write reflinks
  + using Rust io::copy()'s native copy_file_range() support[1]
  + optional archive data segment alignment for optimal reflink use[2]
- hardlink support
- comprehensive tests asserting GNU cpio binary output compatibility

1. Rust io::copy() copy_file_range()
   rust-lang/rust#75272

2. Data segment alignment
   We're bending the newc spec a bit to inject zeros after the file path
   to provide data segment alignment. These zeros are accounted for in
   the namesize, but some applications may only expect a single
   zero-terminator (and 4 byte alignment). GNU cpio and Linux initramfs
   handle this fine as long as PATH_MAX isn't exceeded.

Signed-off-by: David Disseldorp <[email protected]>
(cherry picked from commit a9c6704)
cgwalters added a commit to cgwalters/ostree-rs-ext that referenced this pull request Jan 26, 2022
I stumbled across the fact that we no longer need
coreos/openat-ext@c377a54
because
rust-lang/rust#75272
landed just a few months after!

While we're here, slightly clean up the fd dance to make things a bit
safer using `BorrowedFd`.  It's interesting to note here that with
io-lifetimes we could add a method to the glib crate to borrow the
underlying fd safely.
cgwalters added a commit to coreos/openat-ext that referenced this pull request Jan 26, 2022
cgwalters added a commit to cgwalters/coreos-installer that referenced this pull request Feb 15, 2022
cgwalters added a commit to cgwalters/coreos-installer that referenced this pull request Feb 15, 2022
desbma added a commit to desbma/rsop that referenced this pull request Jul 24, 2022
It seems to be upstream now, see: rust-lang/rust#75272
However for rsop splice is not used anymore, see 'test_splice' script.
desbma added a commit to desbma/rsop that referenced this pull request Jul 24, 2022
It seems to be upstream now, see: rust-lang/rust#75272
However for rsop splice is not used anymore, see 'test_splice' script.
desbma added a commit to desbma/rsop that referenced this pull request Jan 19, 2024
It is now done transparently by the Rust stdlib: rust-lang/rust#75272
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specialize std::io::copy for files, sockets, or pipes on Linux
9 participants