Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-117151: IO performance improvement, increase io.DEFAULT_BUFFER_SIZE to 128k #118144

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

morotti
Copy link
Contributor

@morotti morotti commented Apr 22, 2024

See discussion gh-117151

This patch adjusts the buffer size. That gives 3 to 5 times I/O performance improvement on modern hardware.

  • increase io.DEFAULT_BUFFER_SIZE to 128k
  • fix open() to use max(st_blksize, io.DEFAULT_BUFFER_SIZE)

Copy link

cpython-cla-bot bot commented Apr 22, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Apr 22, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@encukou encukou added the performance Performance or resource usage label Apr 22, 2024
@bedevere-app
Copy link

bedevere-app bot commented Apr 22, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@morotti
Copy link
Contributor Author

morotti commented Apr 25, 2024

@serhiy-storchaka hello, you were kind to review my first PR for buffering performance (118037), would you be able to review this PR too?

@eendebakpt
Copy link
Contributor

eendebakpt commented Apr 30, 2024

@morotti You still need to write a news entry for the PR. For your other PR this is was not required, but due to the performance impact of this PR I think we should. The most convenient way for this is to click on the Details button next to bedevere/news of the CI checks which will open a tool to add the news entry. For more information also see https://devguide.python.org/core-developers/committing/#updating-news-and-what-s-new-in-python

The CLA is not yet working, but it worked for the other PR (which was created after this one), so maybe it will resolve itself automatically in a couple of days

@eendebakpt
Copy link
Contributor

Pinging @benjaminp as expert on io. Could you review this PR? Are there other benchmarks (besides the one in the corresponding issue) we can perform to test this PR?

@morotti
Copy link
Contributor Author

morotti commented Apr 30, 2024

Thank you for reviewing,

The CLA check is green now and I added a news entry.

@morotti
Copy link
Contributor Author

morotti commented May 17, 2024

@eendebakpt @benjaminp could you review?

@eendebakpt
Copy link
Contributor

@eendebakpt @benjaminp could you review?

@morotti The PR looks good from my side, but I am no core dev so I cannot approve. Currently many core devs are at pycon US, so I think we should wait a bit more. If in a few weeks there has been no further response, we can post a message to discourse (see https://devguide.python.org/getting-started/pull-request-lifecycle/#reviewing).

@itamaro itamaro added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 23, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @itamaro for commit 3668d1a 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 23, 2024
@itamaro itamaro added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 1, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @itamaro for commit 28e7bb7 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 1, 2024
@erlend-aasland
Copy link
Contributor

Requesting Serhiy's review, since he reviewd the other linked (and merged) PR #118037.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was convinced that such a change could be beneficial.

But for the case if it has unexpected effect, please ask on https://discuss.python.org/. Update also the C implementation of open() which is used by default.

Lib/_pyio.py Outdated Show resolved Hide resolved
@serhiy-storchaka
Copy link
Member

Please do not use rebase and force-push. It makes reviewing more difficult.

@morotti
Copy link
Contributor Author

morotti commented Jun 17, 2024

apologies, I thought I had to squash the new commits and force push.

@morotti
Copy link
Contributor Author

morotti commented Jun 18, 2024

all adjusted

  • found and corrected another code path in the C API that was using the block size
  • added a test
  • updated the documentation

@cmaloney
Copy link
Contributor

@morotti

  1. winconsoleio picks this up (https://github.com/python/cpython/blob/main/Modules/_io/winconsoleio.c#L425), and has a past bug that there is a cap (https://bugs.python.org/issue11395, https://github.com/python/cpython/blob/main/Modules/_io/winconsoleio.c#L1027-L1031). I don't think this PR breaks that (winconsoleio in write() cuts up too big of writes), but likely the default buffer size there shold be separated from this new bigger constant which is always going to require splitting write calls.
  2. do you have any performance numbers around local disk read() with this change? (Curious for small and large local cached file)

@morotti
Copy link
Contributor Author

morotti commented Jul 8, 2024

thanks, having a look,

looking at the old threads, the bug with winconsole destroying large input was reported in 2010.
it was Windows XP era (2002-2014), it's not claer if the bug still applies to active windows versions. the limitation is no longer listed in the MSDN page.
support for windows XP was dropped forever ago, support for windows 7 was dropped with the python 3.11 release in 2022.

https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232
Correction: the ​MSDN documentation for WriteConsoleW does document a length limitation, but it says 64 KiB (which would be 32767 UTF-16 characters and a terminating NUL), not 26608 characters. The fix is the same regardless; I'll limit it to 10000 [UTF-16] characters.

@morotti
Copy link
Contributor Author

morotti commented Jul 8, 2024

@cmaloney

I've adjusted winconsole to have a different default buffer size.
I've added comments with what I could gather from the threads.

for the impact on read(), it's quite variable.

  • larger write() gives 3 to 5 times performance improvements, just from the I/O.
  • larger read() gives barely a few percent improvements (less than 10% on my machines), just from the I/O. the filesystem is really saving us from this terrible code!

HOWEVER, that's just for the I/O. the app will do something with the chunks and any python code that is doing not-so-small operation on each chunk can get a very large impact.

we've found the bug in pip twice, pypa/pip#12810
there is code to download packages in 10240 bytes chunks + there is code to read package files and compute the checksum in io.DEFAULT_BUFFER_SIZE=8192 chunks.

the download code is obliterated by the small chunks. it's calling some code and updating a progress bar with each chunk, as much as 30% of the pip runtime is updating the progress bar when it's downloading packages 🫨.
the hashing code is merely hashlib.update(buffer) in a loop. it's single digit percent improvement for using larger chunks.
so in the same app, you get both a case where large chunks make an enormous difference and a not so much difference.

it's worthwhile to note that I've found python 3.11 to be 4 times faster than previous versions in tight loop and list comprehension. older python versions would get much more impact from small chunks, due to the larger overhead of many more loop iterations and function calls.

@cmaloney
Copy link
Contributor

Soungs good to me. re: Windows, I dug into the docs and history a bit and it looks like with Windows 8+ it's not necessary anymore. I'm seeing if it's possible to remove the special case in gh-121940.

@@ -1558,15 +1559,18 @@ def __init__(self, file, mode='r', closefd=True, opener=None):
os.set_inheritable(fd, False)

self._closefd = closefd
self._stat_atopen = os.fstat(fd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to merge changes with #123412 (the stat result is now stashed as a member so can do a number of optimizations with it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants