Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

:zlib.inflate_nif/4 error with deflate-encoded resource #215

Closed
dorian-marchal opened this issue Aug 10, 2023 · 8 comments
Closed

:zlib.inflate_nif/4 error with deflate-encoded resource #215

dorian-marchal opened this issue Aug 10, 2023 · 8 comments

Comments

@dorian-marchal
Copy link

dorian-marchal commented Aug 10, 2023

Hello, when fetching deflate-encoded content, Req fails to decode it:

Req.get("https://httpbin.org/deflate", receive_timeout: 20_000)     
** (ErlangError) Erlang error: :data_error
    (erts 13.2.2.2) :zlib.inflate_nif(#Reference<0.3668438451.4136501253.102340>, 8192, 16384, 0)
    (erts 13.2.2.2) :zlib.dequeue_all_chunks_1/3
    (erts 13.2.2.2) :zlib.inflate/3
    (erts 13.2.2.2) :zlib.unzip/1
    (elixir 1.15.4) lib/enum.ex:2510: Enum."-reduce/3-lists^foldl/2-0-"/3
    (elixir 1.15.4) lib/map.ex:916: Map.update!/3
    (req 0.3.11) lib/req/steps.ex:860: Req.Steps.decompress_body/1
    (req 0.3.11) lib/req/request.ex:755: anonymous fn/2 in Req.Request.run_response/2
    (elixir 1.15.4) lib/enum.ex:4830: Enumerable.List.reduce/3
    (elixir 1.15.4) lib/enum.ex:2564: Enum.reduce_while/3
    (req 0.3.11) lib/req/request.ex:683: Req.Request.run/1
    iex:8: (file)

It appears that there is a decoding issue for deflate-encoded content. HTTPoison seems to be able to decode it:

HTTPoison.get("https://httpbin.org/deflate", [], recv_timeout: 20_000)
{:ok,
 %HTTPoison.Response{
   status_code: 200,
   body: <<120, 156, 61, 142, 205, 14, 130, 48, 16, 132, 239, 60, 69, 211, 179, 173, 20, 168, 46, 38, 30, 56, 24, 245,
     106, 48, 241, 138, 116, 249, 137, 210, 154, 82, 15, 74, 120, 119, 11, 36, 94, 54, 153, 111, 103, 103, 118, 8, 8,
     161, 10, 171, 103, 225, 80, 209, 29, 113, 246, 141, 43, 50, 193, 6, 11, 133, 182, 247, 108, 240, 210, 131, 147,
     233, 157, 87, 180, 113, 238, 117, 111, 53, 55, 182, 166, 179, 213, 239, 174, 61, 90, 150, 213, 168, 23, 71, 81, 62,
     52, 126, 214, 130, 11, 224, 226, 111, 186, 177, 172, 251, 106, 150, 219, 162, 68, 118, 158, 218, 232, 197, 24, 183,
     23, 108, 147, 168, 4, 48, 137, 152, 4, 136, 212, 166, 42, 227, 168, 66, 169, 98, 140, 1, 96, 139, 97, 74, 125, 194,
     184, 252, 213, 161, 107, 204, 124, 124, 60, 228, 75, 54, 53, 182, 173, 91, 61, 177, 52, 228, 144, 114, 233, 167,
     164, 193, 24, 252, 0, 149, 102, 58, 41>>,
   headers: [
     {"Date", "Thu, 10 Aug 2023 07:14:14 GMT"},
     {"Content-Type", "application/json"},
     {"Content-Length", "181"},
     {"Connection", "keep-alive"},
     {"Server", "gunicorn/19.9.0"},
     {"Content-Encoding", "deflate"},
     {"Access-Control-Allow-Origin", "*"},
     {"Access-Control-Allow-Credentials", "true"}
   ],
   request_url: "https://httpbin.org/deflate",
   request: %HTTPoison.Request{
     method: :get,
     url: "https://httpbin.org/deflate",
     headers: [],
     body: "",
     params: %{},
     options: [recv_timeout: 20000]
   }
 }}

It seems to work with Finch, too:

Finch.build(:get, "https://httpbin.org/deflate") |> Finch.request(MyFinch)
{:ok,
 %Finch.Response{
   status: 200,
   body: <<120, 156, 61, 142, 49, 15, 130, 48, 16, 133, 119, 126, 69, 211, 217, 86, 170, 5, 193, 196, 129, 193, 168,
     171, 193, 196, 181, 208, 3, 154, 72, 107, 74, 93, 36, 252, 119, 15, 72, 92, 46, 121, 223, 189, 123, 239, 198, 136,
     16, 170, 161, 121, 169, 0, 154, 30, 73, 240, 31, 216, 144, 25, 118, 160, 52, 248, 1, 217, 136, 18, 193, 213, 13, 1,
     21, 237, 66, 120, 87, 198, 114, 231, 91, 186, 88, 113, 247, 24, 192, 179, 162, 5, 187, 56, 122, 99, 195, 86, 240,
     132, 139, 191, 225, 201, 138, 254, 107, 89, 233, 85, 13, 236, 54, 55, 209, 187, 115, 225, 36, 88, 42, 181, 84, 66,
     72, 22, 239, 85, 222, 168, 58, 67, 181, 19, 34, 215, 7, 153, 30, 42, 104, 82, 138, 9, 211, 250, 83, 15, 161, 115,
     203, 241, 229, 92, 174, 217, 212, 121, 211, 26, 59, 179, 60, 230, 89, 206, 19, 156, 9, 141, 166, 232, 7, 122, 240,
     56, 212>>,
   headers: [
     {"date", "Thu, 10 Aug 2023 08:34:32 GMT"},
     {"content-type", "application/json"},
     {"content-length", "177"},
     {"connection", "keep-alive"},
     {"server", "gunicorn/19.9.0"},
     {"content-encoding", "deflate"},
     {"access-control-allow-origin", "*"},
     {"access-control-allow-credentials", "true"}
   ]
 }}

Note: if httpbin.org is down, one can test with kennethreitz/httpbin docker image.

@wojtekmach
Copy link
Owner

Thank you for the report. The problem is we're using :zlib.unzip/1 for deflate:

https://github.com/wojtekmach/req/blob/v0.3.11/lib/req/steps.ex#L871

however it seems we should use :zlib.uncompress/1:

iex> Req.get!("http://localhost:8000/deflate", raw: true).body |> :zlib.uncompress()
"{\n  \"deflated\": true, \n  \"headers\": {\n    \"Accept-Encoding\": \"zstd, br, gzip, deflate\", \n    \"Host\": \"localhost:8000\", \n    \"User-Agent\": \"req/0.3.11\"\n  }, \n  \"method\": \"GET\", \n  \"origin\": \"172.17.0.1\"\n}\n"

I'm a bit lost which content-encoding header value should correspond to which :zlib functions so any links to definite references would be very helpful.

@dorian-marchal
Copy link
Author

dorian-marchal commented Aug 10, 2023

I'm a bit lost which content-encoding header value should correspond to which :zlib functions so any links to definite references would be very helpful.

Not sure either, but for "deflate", :zlib.inflate/2 looks more appropriate, maybe? I'm way out of my confort zone, here, sorry. 😅

@wojtekmach
Copy link
Owner

Looks like Tesla has the same issue:

iex> Mix.install([:tesla])
iex> Tesla.client([Tesla.Middleware.DecompressResponse]) |> Tesla.get("http://localhost:8000/deflate")
** (ErlangError) Erlang error: :data_error
    (erts 14.0.2) :zlib.inflate_nif(#Reference<0.2165264108.559022081.244095>, 8192, 16384, 0)
    (erts 14.0.2) :zlib.dequeue_all_chunks_1/3
    (erts 14.0.2) :zlib.inflate/3
    (erts 14.0.2) :zlib.unzip/1
    (tesla 1.7.0) lib/tesla/middleware/compression.ex:65: Tesla.Middleware.Compression.decompress/1
    (tesla 1.7.0) lib/tesla/middleware/compression.ex:60: Tesla.Middleware.Compression.decompress/1
    iex:16: (file)

localhost:8000 is httpbin running with Docker.

@dorian-marchal
Copy link
Author

dorian-marchal commented Aug 10, 2023 via email

@wojtekmach
Copy link
Owner

It does. Why?

@dorian-marchal
Copy link
Author

Thinking out loud, sorry, it doesn't matter actually as the compression is handled by Tesla.

@wojtekmach
Copy link
Owner

@dorian-marchal I decided to just remove support for deflate. If it's a blocker for you to adopt Req, please let me know. There's a pretty funny/tragic history about it pasted below. Given the confusion and that support varies (or, you know, varied like 10-15 years ago and a bunch of people decided to drop it too) I don't want to deal with it unless there are actual use cases for this.

Per https://zlib.net/zlib_faq.html#faq39

What's the difference between the "gzip" and "deflate" HTTP 1.1 encodings?

"gzip" is the gzip format, and "deflate" is the zlib format. They should probably have called the second one "zlib" instead to avoid confusion with the raw deflate compressed data format. While the HTTP 1.1 RFC 2616 correctly points to the zlib specification in RFC 1950 for the "deflate" transfer encoding, there have been reports of servers and browsers that incorrectly produce or expect raw deflate data per the deflate specification in RFC 1951, most notably Microsoft. So even though the "deflate" transfer encoding using the zlib format would be the more efficient approach (and in fact exactly what the zlib format was designed for), using the "gzip" transfer encoding is probably more reliable due to an unfortunate choice of name on the part of the HTTP 1.1 authors.

Bottom line: use the gzip format for HTTP 1.1 encoding.

@wojtekmach
Copy link
Owner

OK, RFC 9110 § 8.4.1.2 says:

The "deflate" coding is a "zlib" data format [RFC1950] containing a "deflate" compressed data stream [RFC1951] that uses a combination of the Lempel-Ziv (LZ77) compression algorithm and Huffman coding.

Note: Some non-conformant implementations send the "deflate" compressed data without the zlib wrapper.

(emphasis mine)

so yeah, it is well specified after all, we can bring back the support if someone really needs it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants