Significantly speed up parsing netloc in URL objects #1112

bdraco · 2024-09-05T23:58:29Z

What do these changes do?

Currently to make a URL object that is not already encoded we would call urlsplit to generate a SplitResult, and than we would always call .hostname, .port,.username, and .password, which all have to reparse the netloc each time.

To improve the performance

parse out the hostname, port,username, and password one time in a new function _split_netloc which is a slimmed down version of what SplitResult does that only does what we need.
wrap _split_netloc in lru_cache the same size as the one urlsplit uses since we tend to see the same netloc over and over again.
When URL is created with encoded=False, save the parsed values from the netloc and SplitResult into the cache so they do not have to be calculated again the first time the properties are accessed as it avoids a second call into SplitResult's parsers of the same data.
When URL is created with encoded=True, parse the netloc as soon as raw_password, raw_user, raw_host or explict_port are accessed and cache the result of the other values to avoid having to parse again when they are accessed since is almost guaranteed that we will access raw_password if we access raw_user and vise-versa. Same with explict_port and raw_host.

Even on the cache miss case this is much faster because we avoid so many reparses of the same data. For an aiohttp request with url passed as string we would previously have to parse the netloc 8x (4x for the properties to build the netloc, and 4x for the reading the properties back from yarl)

Are there changes in behavior for the user?

The additional LRU will use a tiny bit more memory, but is limited to 128 keys which will make it negligible

# timeit.timeit('URL("http://user:[email protected]?a+b=c+d").raw_host', globals=globals(), number=100000)
1.10.0.dev0: 0.13812883291393518
1.9.11: 0.2996841249987483

# timeit.timeit('URL("http://example.com?a+b=c+d").raw_host', globals=globals(), number=100000)
1.10.0.dev0: 0.11170849995687604
1.9.11: 0.22350204200483859

# timeit.timeit('URL("http://example.com").raw_host', globals=globals(), number=100000)
1.10.0.dev0: 0.09903637506067753
1.9.11: 0.22337879217229784

# timeit.timeit('URL("http://example.com")', globals=globals(), number=100000)
1.10.0.dev0: 0.09579283301718533
1.9.11: 0.16477595805190504

before

after

Related issue

#196 (only partially solves the issues mentioned there)

This reverts commit 855f57b.

codecov · 2024-09-06T00:19:22Z

Codecov Report

Attention: Patch coverage is 96.89922% with 4 lines in your changes missing coverage. Please review.

Project coverage is 95.08%. Comparing base (926b9dc) to head (02e72be).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
yarl/_url.py	94.52%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1112      +/-   ##
==========================================
+ Coverage   95.06%   95.08%   +0.02%     
==========================================
  Files          30       30              
  Lines        4497     4600     +103     
  Branches      396      410      +14     
==========================================
+ Hits         4275     4374      +99     
- Misses        196      200       +4     
  Partials       26       26

Flag	Coverage Δ
CI-GHA	`95.04% <96.89%> (+0.02%)`	⬆️
MyPy	`40.65% <47.61%> (-0.07%)`	⬇️
OS-Linux	`99.37% <100.00%> (+0.01%)`	⬆️
OS-Windows	`99.47% <100.00%> (+0.01%)`	⬆️
OS-macOS	`99.07% <100.00%> (+0.02%)`	⬆️
Py-3.10.11	`98.96% <100.00%> (+0.02%)`	⬆️
Py-3.10.14	`99.21% <100.00%> (+0.02%)`	⬆️
Py-3.11.9	`99.21% <100.00%> (+0.02%)`	⬆️
Py-3.12.5	`99.21% <100.00%> (+0.02%)`	⬆️
Py-3.13.0-rc.1	`99.21% <100.00%> (+0.02%)`	⬆️
Py-3.8.10	`98.91% <100.00%> (+0.03%)`	⬆️
Py-3.8.18	`99.15% <100.00%> (+0.02%)`	⬆️
Py-3.9.13	`98.91% <100.00%> (+0.03%)`	⬆️
Py-3.9.19	`99.15% <100.00%> (+0.02%)`	⬆️
Py-pypy7.3.11	`99.21% <100.00%> (+0.02%)`	⬆️
Py-pypy7.3.16	`99.21% <100.00%> (+0.02%)`	⬆️
Py-pypy7.3.17	`99.23% <100.00%> (+0.02%)`	⬆️
VM-macos-latest	`99.07% <100.00%> (+0.02%)`	⬆️
VM-ubuntu-latest	`99.37% <100.00%> (+0.01%)`	⬆️
VM-windows-latest	`99.47% <100.00%> (+0.01%)`	⬆️
pytest	`99.37% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bdraco · 2024-09-06T05:05:57Z

Need to look at the aiohttp server case a bit more. In some cases sending an unencoded url beats the pre encoded case depending on what we access because of cache misses and reparse of the same data

yarl/_url.py

bdraco · 2024-09-06T06:22:19Z

Home Assistant test case:

3 min up time (expect startup to resync more data)
2024-09-05 20:21:33.691 CRITICAL (SyncWorker_26) [homeassistant.components.profiler] Cache stats for lru_cache <function URL._split_netloc at 0x7ff8fea02700> at /usr/local/lib/python3.12/site-packages/yarl/_url.py: CacheInfo(hits=2086, misses=123, maxsize=128, currsize=123)
8 min up time
2024-09-05 20:29:30.539 CRITICAL (SyncWorker_50) [homeassistant.components.profiler] Cache stats for lru_cache <function URL._split_netloc at 0x7ff8fea02700> at /usr/local/lib/python3.12/site-packages/yarl/_url.py: CacheInfo(hits=3813, misses=126, maxsize=128, currsize=126)
30 min up time
2024-09-05 20:45:44.736 CRITICAL (SyncWorker_35) [homeassistant.components.profiler] Cache stats for lru_cache <function URL._split_netloc at 0x7ff8fea02700> at /usr/local/lib/python3.12/site-packages/yarl/_url.py: CacheInfo(hits=7608, misses=127, maxsize=128, currsize=127)

new looks like this now:

yarl/_url.py

bdraco · 2024-09-06T06:31:47Z

_url.py is just barely above client_proto.py in the profile now. It was far above it before.

bdraco · 2024-09-06T06:49:55Z

Will sleep on this one to see if there is any more coverage that would be good to add

yarl/_url.py

bdraco · 2024-09-06T18:03:08Z

Retest on Home Assistant production installs looks good

bdraco · 2024-09-07T04:02:23Z

3 hours
2024-09-06 23:01:09.034 CRITICAL (SyncWorker_30) [homeassistant.components.profiler] Cache stats for lru_cache <function URL._split_netloc at 0x7f23be2e5940> at /usr/local/lib/python3.12/site-packages/yarl/_url.py: CacheInfo(hits=37364, misses=104, maxsize=128, currsize=104)

for comparison

2024-09-06 23:01:09.032 CRITICAL (SyncWorker_30) [homeassistant.components.profiler] Cache stats for lru_cache <function urlsplit at 0x7f23becb7ce0> at /usr/local/lib/python3.12/urllib/parse.py: CacheInfo(hits=57950, misses=2839, maxsize=128, currsize=128)

bdraco · 2024-09-07T17:02:46Z

13 hours

2024-09-07 07:01:46.909 CRITICAL (SyncWorker_32) [homeassistant.components.profiler] Cache stats for lru_cache <function URL._split_netloc at 0x7f4f1c6e6700> at /usr/local/lib/python3.12/site-packages/yarl/_url.py: CacheInfo(hits=157229, misses=129, maxsize=128, currsize=128)

bdraco · 2024-09-07T17:09:04Z

17 hours on another HA install
2024-09-07 12:08:06.055 CRITICAL (SyncWorker_7) [homeassistant.components.profiler] Cache stats for lru_cache <function URL._split_netloc at 0x7f23be2e5940> at /usr/local/lib/python3.12/site-packages/yarl/_url.py: CacheInfo(hits=173706, misses=105, maxsize=128, currsize=105)

So its turning out to be a really great use of an LRU

bdraco added 21 commits September 5, 2024 17:10

wip

a30864c

wip

a78a144

wip

68cf631

cache

f28baa7

optimize

b58592a

reduce

c122031

reduce

a9dcda1

coverage

66a9938

cache test

a43e71e

cache test

70a2a0c

tweaks

86d76c3

save the whole thing

d46de13

save the whole thing

3019e0c

reduce

04029ef

tweak

4c03a43

match urlsplit

855f57b

Revert "match urlsplit"

4e11af0

This reverts commit 855f57b.

cache

9beacb4

preen

51f8428

preen

499c15f

preen

7750f68

bdraco commented Sep 6, 2024

View reviewed changes

yarl/_url.py Outdated Show resolved Hide resolved

parse netloc only once per url

d53ffa7

bdraco changed the title ~~Significantly speed up creating unencoded URL objects~~ Significantly speed up parsing netloc in URL objects Sep 6, 2024

bdraco added 3 commits September 6, 2024 00:53

try to keep more of the code the same

de9331a

update docstring

3ccd4c6

dry

38b823b

bdraco commented Sep 6, 2024

View reviewed changes

yarl/_url.py Show resolved Hide resolved

webknjaz reviewed Sep 6, 2024

View reviewed changes

yarl/_url.py Outdated Show resolved Hide resolved

bdraco added 2 commits September 6, 2024 11:52

Merge branch 'master' into split_netloc

b4c6b1e

changelog

83eb4f3

psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Sep 6, 2024

bdraco added 6 commits September 6, 2024 11:55

Merge branch 'master' into split_netloc

5ee53a2

Union

c6ffe6b

tweaks to make it a bit more readable and avoid the getitems

454f6e6

Merge branch 'master' into split_netloc

b426678

fix refactoring error

939d976

use PyIsTrue

02e72be

bdraco commented Sep 6, 2024

View reviewed changes

yarl/_url.py Show resolved Hide resolved

bdraco commented Sep 6, 2024

View reviewed changes

yarl/_url.py Show resolved Hide resolved

bdraco marked this pull request as ready for review September 6, 2024 18:02

bdraco merged commit bb585b0 into master Sep 6, 2024
47 of 49 checks passed

bdraco deleted the split_netloc branch September 6, 2024 18:03

bdraco mentioned this pull request Sep 6, 2024

Add types to the internal cache #1117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significantly speed up parsing netloc in URL objects #1112

Significantly speed up parsing netloc in URL objects #1112

bdraco commented Sep 5, 2024 •

edited

Loading

codecov bot commented Sep 6, 2024 •

edited

Loading

bdraco commented Sep 6, 2024

bdraco commented Sep 6, 2024 •

edited

Loading

bdraco commented Sep 6, 2024

bdraco commented Sep 6, 2024

bdraco commented Sep 6, 2024

bdraco commented Sep 7, 2024 •

edited

Loading

bdraco commented Sep 7, 2024

bdraco commented Sep 7, 2024

Significantly speed up parsing netloc in URL objects #1112

Significantly speed up parsing netloc in URL objects #1112

Conversation

bdraco commented Sep 5, 2024 • edited Loading

What do these changes do?

Are there changes in behavior for the user?

Related issue

codecov bot commented Sep 6, 2024 • edited Loading

Codecov Report

bdraco commented Sep 6, 2024

bdraco commented Sep 6, 2024 • edited Loading

bdraco commented Sep 6, 2024

bdraco commented Sep 6, 2024

bdraco commented Sep 6, 2024

bdraco commented Sep 7, 2024 • edited Loading

bdraco commented Sep 7, 2024

bdraco commented Sep 7, 2024

bdraco commented Sep 5, 2024 •

edited

Loading

codecov bot commented Sep 6, 2024 •

edited

Loading

bdraco commented Sep 6, 2024 •

edited

Loading

bdraco commented Sep 7, 2024 •

edited

Loading