read_to_end grows buf beyond take limit #5594

danielnorberg · 2023-04-03T09:23:58Z

Version
1.25.0

Platform

Linux dano-worker-nvme-12-48g 5.15.0-1030-gcp #37~20.04.1-Ubuntu SMP Mon Feb 20 04:30:57 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Description
When calling read_to_end on an AsyncRead with a take(n) limit and a buf Vec<u8> initialized with_capacity(n), the buf is unnecessarily grown beyond n, incurring an expensive realloc.

let stream: AsyncRead = ...; // A stream reading 1MB from an HTTP response body
let n = 1024 * 1024 as usize;
let mut buf = Vec::with_capacity(n);
info!(
    "before read_to_end: buf len: {}, capacity: {}",
    buf.len(),
    buf.capacity()
);
stream.take(n as u64).read_to_end(&mut buf).await?;
info!(
    "after read_to_end: buf len: {}, capacity: {}",
    buf.len(),
    buf.capacity()
);

The code above yields the following info output:

before read_to_end: buf len: 0, capacity: 1048576
after read_to_end: buf len: 1048576, capacity: 2097152

The buf has been doubled in size despite only containing the number of bytes that was specified in take(n) and being initialized with_capacity(n).

I would expect the take(n) to prevent the buf from being grown beyond n.

It seems to me that this is due to the unconditional call to buf.reserve(32) in poll_read_to_end.

The text was updated successfully, but these errors were encountered:

Darksonn · 2023-04-03T09:25:57Z

Thank you. This does indeed seem like a bug.

tzx · 2023-04-07T22:16:45Z

buf.reserve(32) does have a check. I think the doubling is from how Rust makes allocations amortized, so it doubles when full. I also think that's what the comment about the adaptive system is. Since it checks for space (and allocates if full) before polling, and we poll for another read with 0 bytes to indicate an EOF, the final buffer would be doubled even though we read nothing.

I would like to tackle this, but I don't know how or if it is even desirable. If you used take and you know the number of bytes, wouldn't read_exact be more desirable to use? But let's say I want to solve this more generally, how would one detect an EOF without a 0-byte read? My initial thought is making a small buffer when the buffer is full to check if there's more bytes to read before allocating, but that doesn't seem ideal.

Darksonn · 2023-04-07T22:45:19Z

It will be somewhat of an optimization. Here's what I suggest: if the vector is exactly filled, then we can try to read into an [u8; 32] local variable instead of the vector to avoid resizing it. Additionally, we should only try this optimization if the vector still has the capacity it originally had, and if the starting capacity is non-zero.

This should handle the case where the vector has exactly the right capacity without reallocating. For cases where the capacity isn't sufficient, we make one small read, but it's probably fine since we only make one.

Fixes: tokio-rs#5594

danielnorberg · 2023-04-08T08:38:27Z

Naively I would’ve been tempted to use the information passed in to take(n) to know that the stream is at EOF without an additional read, but I’m not familiar enough with the tokio implementation to know if that’s feasible.

As for read_exact, yes that would seem preferable in cases where the stream will always have at least n bytes but in my case it will have at most n bytes.

Darksonn · 2023-04-08T09:09:59Z

It would be possible to special-case take to do it, but it seems nice to have the optimization also work for e.g. a file where you checked the length before-hand, or other similar cases.

Fixes: tokio-rs#5594

danielnorberg added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Apr 3, 2023

Darksonn added M-io Module: tokio/io E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Experience needed to fix: Medium / intermediate labels Apr 3, 2023

tzx added a commit to tzx/tokio that referenced this issue Apr 8, 2023

io: make read_to_end not grow unnecessarily

43001d2

Fixes: tokio-rs#5594

tzx mentioned this issue Apr 8, 2023

io: make read_to_end not grow unnecessarily #5610

Merged

tzx added a commit to tzx/tokio that referenced this issue Apr 11, 2023

io: make read_to_end not grow unnecessarily

5b9b544

Fixes: tokio-rs#5594

tzx added a commit to tzx/tokio that referenced this issue Apr 16, 2023

io: make read_to_end not grow unnecessarily

06c5025

Fixes: tokio-rs#5594

Darksonn closed this as completed in #5610 Apr 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_to_end grows buf beyond take limit #5594

read_to_end grows buf beyond take limit #5594

danielnorberg commented Apr 3, 2023 •

edited

Loading

Darksonn commented Apr 3, 2023

tzx commented Apr 7, 2023 •

edited

Loading

Darksonn commented Apr 7, 2023

danielnorberg commented Apr 8, 2023

Darksonn commented Apr 8, 2023

read_to_end grows buf beyond take limit #5594

read_to_end grows buf beyond take limit #5594

Comments

danielnorberg commented Apr 3, 2023 • edited Loading

Darksonn commented Apr 3, 2023

tzx commented Apr 7, 2023 • edited Loading

Darksonn commented Apr 7, 2023

danielnorberg commented Apr 8, 2023

Darksonn commented Apr 8, 2023

danielnorberg commented Apr 3, 2023 •

edited

Loading

tzx commented Apr 7, 2023 •

edited

Loading