Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support transparent gzip decompression #256

Closed
Alexander-Barth opened this issue Jun 19, 2018 · 9 comments
Closed

Support transparent gzip decompression #256

Alexander-Barth opened this issue Jun 19, 2018 · 9 comments
Labels
client About our HTTP client enhancement
Milestone

Comments

@Alexander-Barth
Copy link

Some server always respond with a compressed stream, e.g. the stackoverflow API:

import HTTP
r = HTTP.get("https://api.stackexchange.com/2.2/questions"; query = Dict("site" => "stackoverflow"));

It would be nice if I can directly access the decompressed response body.
As a workaround, I can use CodecZlib

using CodecZlib
String(transcode(GzipDecompressor,r.body))

Would it be possible to support transparent decompression, or is it already implemented?

Julia 0.6.3
HTTP.jl master 61e6b05
MbedTLS.jl 0.5.9

@samoconnor
Copy link
Contributor

Yes, this could be supported.
This is the place to hook it in:

HTTP.jl/src/Messages.jl

Lines 395 to 401 in b602c4e

function decode(m::Message, encoding::String)::Vector{UInt8}
if encoding == "gzip"
# Use https://github.com/bicycle1885/TranscodingStreams.jl ?
end
@warn "Decoding of HTTP Transfer-Encoding is not implemented yet!"
return m.body
end

@EricForgy
Copy link
Contributor

This is cool. You guys ok to add CodecZlib as a dependency? Making the change is easy. Any idea what would be a good test?

@Alexander-Barth
Copy link
Author

A test case could be the use of http://httpbin.org/gzip:

julia> r = Requests.get("http://httpbin.org/gzip")
Response(200 OK, 13 headers, 221 bytes in body)

julia> print(readstring(r))
{"gzipped":true,"headers":{"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Connection":"close","Host":"httpbin.org","User-Agent":"Requests.jl/0.0.0"},"method":"GET","origin":"139.165.106.124"}

julia> r.headers
Dict{String,String} with 13 entries:
  "Connection"                       => "keep-alive"
  "Via"                              => "1.1 vegur"
  "Access-Control-Allow-Credentials" => "true"
  "Date"                             => "Wed, 20 Jun 2018 08:13:50 GMT"
  "Access-Control-Allow-Origin"      => "*"
  "http_minor"                       => "1"
  "Keep-Alive"                       => "1"
  "status_code"                      => "200"
  "Server"                           => "gunicorn/19.8.1"
  "Content-Length"                   => "191"
  "http_major"                       => "1"
  "Content-Type"                     => "application/json"
  "Content-Encoding"                 => "gzip"

@jakewilliami
Copy link

jakewilliami commented May 31, 2021

I'm not sure if this is still being worked on, but regarding the method we use to decompress, there seem to be two reasonable options (that I can find). Option 1 uses (as suggested by @samoconnor above) TranscodingStreams and CodecZlib. Option 2 uses Libz, which seems to be a faster Gzip. My test data is a 105-element Vector{UInt8}, but perhaps there is better test data out there.

using BenchmarkTools, TranscodingStreams, CodecZlib, Libz

function decode1(data::Vector{UInt8})
    return String(transcode(GzipDecompressor, data))
end

function decode2(data::Vector{UInt8})
    return String(read(ZlibInflateInputStream(data)))
end

Benchmarking this, we get

julia> @btime decode1(test);
  1.770 μs (7 allocations: 832 bytes);

julia> @btime decode2(test);
  1.517 μs (7 allocations: 9.77 KiB)

Or, using the suggested test data by @Alexander-Barth:

julia> @btime decode1(r.body);
  4.948 μs (7 allocations: 1.31 KiB)

julia> @btime decode2(r.body);
  4.417 μs (7 allocations: 9.89 KiB)

Not sure if this information is useful, but thought I'd share what I've found.

@ashwani-rathee
Copy link

@jakewilliami this was useful, tysm!!

@fonsp fonsp added the client About our HTTP client label Mar 16, 2022
@quinnj quinnj added this to the 1.0 milestone May 27, 2022
@quinnj
Copy link
Member

quinnj commented May 27, 2022

Going to put this on the 1.0 milestone. I'll try to work on this soon.

quinnj added a commit that referenced this issue May 28, 2022
Implements #256. If the content-encoding of a response is "gzip"
and the keyword argument `decompress === true`, then we'll
use CodecZlib.jl to decompress the response and set as the response
body. Passing `decompress=false` will leave the resposne body as-is.
We also support `HTTP.decode(::Request, "gzip")` which will do
the decompression.
@quinnj
Copy link
Member

quinnj commented May 28, 2022

PR is up: #838

quinnj added a commit that referenced this issue May 28, 2022
* Auto decompress gzip-encoding response bodies

Implements #256. If the content-encoding of a response is "gzip"
and the keyword argument `decompress === true`, then we'll
use CodecZlib.jl to decompress the response and set as the response
body. Passing `decompress=false` will leave the resposne body as-is.
We also support `HTTP.decode(::Request, "gzip")` which will do
the decompression.

* fixes

* fix

* fix

* fix'

* fix
@Alexander-Barth
Copy link
Author

Thank you very much for this great package and this new feature 🙂

@Anastasgrek
Copy link

Hello everyone! I have a problem about this topic. What about WebSocket connection? I getting packages with compress data from server, but I can't decompress it. I get zlib error: <no message> (code: -5) from CodecZlib. it works if I do this:

using HTTP
using TranscodingStreams, CodecZlib
using JSON
function extract_data(io::IOBuffer)
    stream = TranscodingStream(DeflateDecompressor(), io)
    try
        for line in eachline(stream) end
    catch e
        if e.msg != "zlib error: <no message> (code: -5)"
            throw(e.msg)
        end
    end
    l = stream.state.buffer1.transcoded
    d = stream.state.buffer1.data[1:l]
    return JSON.parse(String(d))
end

HTTP.WebSockets.open(
    "wss://stream.binance.com:9443/stream?streams=btcusdt@depth5@100ms",
    headers = [
        "sec-websocket-extensions" => "permessage-deflate; client_no_context_takeover; server_no_context_takeover, client_max_window_bits",
    ]
    ) do ws

    while true
        data = readavailable(ws) |> IOBuffer |> extract_data
        println(data["stream"])
        # break
    end
end

I understand, in this PR gzip decomression protocol, but in my opinion, it's not much different.
Do you have any ideas, how fix it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client About our HTTP client enhancement
Projects
None yet
Development

No branches or pull requests

8 participants