Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify stack #738

Closed
wants to merge 8 commits into from
Closed

Conversation

rikhuijzer
Copy link
Contributor

This PR reduces compilation latency and simplifies the logic around HTTP.stack. As a side-effect, also #469 should be fixed now. The benchmarks below are on Julia 1.7-beta3.0.

master:

julia> @time HTTP.get("https://example.com")
6.148291 seconds (7.87 M allocations: 415.763 MiB, 5.65% gc time)

this PR:

julia> @time HTTP.get("https://example.com")
5.651283 seconds (7.49 M allocations: 398.512 MiB, 1.80% gc time, 0.88% compilation time)

So, about 4% reduction in allocations and half a second shorter compilation time on the first request. The difference gets bigger when users make multiple types of requests:

master:

julia> @time begin
       HTTP.get("https://example.com")
       HTTP.get("https://example.com"; redirect=false)
       HTTP.get("https://example.com"; detect_content_type=true)
       HTTP.get("https://example.com"; redirect=false, detect_content_type=true)
       end
 8.300049 seconds (10.08 M allocations: 534.940 MiB, 4.00% gc time)

this PR:

julia> @time begin
       HTTP.get("https://example.com")
       HTTP.get("https://example.com"; redirect=false)
       HTTP.get("https://example.com"; detect_content_type=true)
       HTTP.get("https://example.com"; redirect=false, detect_content_type=true)
       end
  8.110491 seconds (9.45 M allocations: 504.894 MiB, 2.18% gc time, 0.62% compilation time)

This is a reduction in allocations of 5.6 %.

The reason that this PR reduces the latency is that HTTP.request was specialized on too much. For each different stack, HTTP.request got multiple different new methodinstances.

These changes should not affect the runtime. It would technically be possible to define a Stack{U,V} type which would allow all the HTTP.requests to avoid runtime dispatch. However, since some request methods are quite large, I don't think that the extra memory consumption would be worth it.

Runtime benchmark for master
julia> @benchmark HTTP.get("https://example.com")
BenchmarkTools.Trial: 67 samples with 1 evaluation.
 Range (min … max):  75.371 ms …  76.668 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     75.577 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   75.630 ms ± 218.098 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▃  ▁█   ▃▁ ▁      ▁
  ▄▁▄█▇▇██▄▇▄██▇█▇▇▇▇▇▁█▁▇▄▄▁▇▁▇▄▇▄▁▁▁▁▄▁▁▄▄▁▁▄▁▁▁▁▁▁▁▁▁▁▁▄▁▁▄ ▁
  75.4 ms         Histogram: frequency by time         76.2 ms <

 Memory estimate: 13.78 KiB, allocs estimate: 163.
Runtime benchmark for this PR
julia> @benchmark HTTP.get("https://example.com")
BenchmarkTools.Trial: 66 samples with 1 evaluation.
 Range (min … max):  75.478 ms …  77.296 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     75.705 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   75.751 ms ± 247.670 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▄    ▄█     ▁▁▄▄█▁  ▁▄ ▄▁ ▁▁   ▁    ▁        ▁            ▁
  ▆█▁▁▁▆██▆▆▆▆▁██████▆▆██▆██▆██▆▆▁█▁▆▁▆█▁▆▁▁▁▁▆▁█▁▁▁▁▁▆▁▁▁▁▁▁█ ▁
  75.5 ms         Histogram: frequency by time         76.2 ms <

 Memory estimate: 16.08 KiB, allocs estimate: 193.

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2021

Codecov Report

Merging #738 (a9e3e38) into master (11bcccc) will increase coverage by 0.16%.
The diff coverage is 89.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #738      +/-   ##
==========================================
+ Coverage   76.89%   77.05%   +0.16%     
==========================================
  Files          38       38              
  Lines        2445     2467      +22     
==========================================
+ Hits         1880     1901      +21     
- Misses        565      566       +1     
Impacted Files Coverage Δ
src/CanonicalizeRequest.jl 0.00% <0.00%> (ø)
src/ContentTypeRequest.jl 0.00% <0.00%> (ø)
src/DebugRequest.jl 0.00% <0.00%> (ø)
src/Pairs.jl 66.66% <ø> (ø)
src/TopRequest.jl 0.00% <0.00%> (-100.00%) ⬇️
src/AWS4AuthRequest.jl 79.36% <100.00%> (ø)
src/BasicAuthRequest.jl 100.00% <100.00%> (ø)
src/ConnectionPool.jl 80.74% <100.00%> (ø)
src/ConnectionRequest.jl 83.33% <100.00%> (ø)
src/CookieRequest.jl 87.09% <100.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 11bcccc...a9e3e38. Read the comment docs.

@c42f
Copy link
Contributor

c42f commented Aug 5, 2021

So the main change here is that the stack is no longer a big nested layer type, but a linked list of Stack{LayerType} structs, with next pointer of type Stack which will be dispatched on at runtime.

This is a pretty large change to the original design which very explicitly aimed to avoid runtime dispatch. Maybe that's fine in the name of simplifying things and I've wondered about this myself.

But I think testing against example.com is not going to tell you anything useful — mostly you're measuring network latency and perhaps also performance/load on the remote server. Rather, it's better to test against a high performance server running on localhost.

Unfortunately we don't have the original author of the current design around anymore to ask for advice and motivating examples.

@rikhuijzer
Copy link
Contributor Author

Thanks for taking a look at the PR. I just figured that my whole PR is based on a wrong assumption. I thought that the Next part of the methods was always dispatching on the full stacked type and therefore causing many unnecessary methodinstances. That's not true, so performance wise, the implementation was already very good.

I could still rewrite this PR to simplify the huge ternary statement in stack and and the TopLayer, but I probably won't. So, I'm closing this for now.

@rikhuijzer rikhuijzer closed this Aug 5, 2021
@c42f
Copy link
Contributor

c42f commented Aug 6, 2021

I thought that the Next part of the methods was always dispatching on the full stacked type and therefore causing many unnecessary methodinstances.

Ah I see. Well, this isn't entirely true, but it may be partially true — while dispatch happens based on the individual layer types, specialization may happen based on the concrete type of the stack from the current layer downward (though the exact amount of specialization is an implementation detail of the compiler).

Personally I think it would be interesting and useful to change the stack into a stack of layer instances rather than types. Then the layers can contain state if necessary (for example, the connection pool layer would no longer need to rely on global state).

If you do want to consider this design further, I'd suggest abstracting the stack such that implementation of individual layers is decoupled from the concrete representation of the stack. Then users could choose their stack representation to hide information from the compiler (as in your Stack list) or give it all the information as in the current highly nested stack type.

@rikhuijzer
Copy link
Contributor Author

rikhuijzer commented Aug 6, 2021

[...] specialization may happen based on the concrete type of the stack from the current layer downward (though the exact amount of specialization is an implementation detail of the compiler).

Fair enough. Well, in Julia 1.7 it looked fine when inspecting methods(requests) via MethodAnalysis.jl.

If you do want to consider this design further,

I don't think anymore that there is much to gain here, so probably not 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants