Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-117151: optimize BufferedWriter(), do not buffer writes that are the buffer size #118037

Merged
merged 1 commit into from
Apr 23, 2024

Conversation

morotti
Copy link
Contributor

@morotti morotti commented Apr 18, 2024

gh-117151: optimize BufferedWriter(), do not buffer writes that are the buffer size

Hello,

I am having a look at the buffering code as part of the linked discussion, #gh-117151
One thing that was puzzling me is finding the buffering code slightly slower than the non-buffering code, given the same conditions, so I started having a deeper look into it.

Could I get a second pair of eyes on this old piece of code?
BufferedWriter() is buffering calls that are the exact same size as the buffer.
It's a very common case to read/write in blocks of the exact buffer size.
I think this might actually be the primary use case, reading in chunks from disk/network and writing to disk.

I think it shouldn't buffer in that case. It's costing an extra memory copy and the buffer will be full and have to be written in the next call anyway.

Thoughts?

… are equal to the buffer size. avoid extra memory copy.

BufferedWriter() was buffering calls that are the exact same size as the buffer. it's a very common case to read/write in blocks of the exact buffer size.

it's pointless to copy a full buffer, it's costing extra memory copy and the full buffer will have to be written in the next call anyway.
Copy link

cpython-cla-bot bot commented Apr 18, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Apr 18, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@morotti
Copy link
Contributor Author

morotti commented Apr 18, 2024

Regarding the CLA, the CLA was signed by my organization yesterday.
Do you need to do something on your end to have the PR bot find the CLA with our org email?

@mdboom mdboom added the performance Performance or resource usage label Apr 18, 2024
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

It can potentially affect the performance of BufferedRandom in rare cases, but I think that they are too rare, and it worst cases it only affects performance, not the behavior. If it affects behavior, then there are other existing bugs in the code.

@serhiy-storchaka
Copy link
Member

How much difference in speed is caused by this PR? If it is more than say 10%, it makes sense to add a short NEWS entry.

@morotti
Copy link
Contributor Author

morotti commented Apr 23, 2024

It's single percent difference, it might not be news worthy.

@serhiy-storchaka
Copy link
Member

The PR was created by @morotti, but the commit was created by @rmmancom. I suspect they have different email addresses. Only one of them signed the CLA. Please sign the CLA for other email address.

@morotti
Copy link
Contributor Author

morotti commented Apr 23, 2024

I've reconfirmed the CLA with both emails. is it okay now?

@serhiy-storchaka serhiy-storchaka merged commit 8fa1248 into python:main Apr 23, 2024
39 checks passed
@serhiy-storchaka
Copy link
Member

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants