Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support accepting gzipped requests #1091

Closed
orf opened this issue Sep 14, 2018 · 26 comments
Closed

Support accepting gzipped requests #1091

orf opened this issue Sep 14, 2018 · 26 comments
Assignees
Labels

Comments

@orf
Copy link
Contributor

orf commented Sep 14, 2018

Feature Request

Describe the problem the feature is intended to solve

Currently tensorflow-serving does not handle GZipped request bodies in the REST endpoint, which can slow down POSTing large volumes of data to tensorflow-serving. It fails with a JSON parse error if this is the case, even if the correct headers are sent. Networks are fast, sure, but why send ~10 MB JSON bodies when you can send ~100kb ones?

Describe the solution

If Content-Encoding: gzip is sent as a header then the body should be decompressed before parsing.

Describe alternatives you've considered

A reverse proxy that decompresses responses before passing them to tf-serving, but this is not an ideal solution and adds overhead.

Additional context

When encoding an image for inference the resulting JSON can be very large, upwards of 10 megabytes. Gzipped this often is reduced to ~400kb.

@ymodak
Copy link
Contributor

ymodak commented Sep 26, 2018

@orf Thank you for the fix and verification. This is an interesting addition.

@orf
Copy link
Contributor Author

orf commented Sep 26, 2018

Just for some idea of numbers, we have seen our payload size reduced by up to 15x (with a mean of 10x) and significantly reduced transfer times.

Using GRPC would be more optimal here (as it supports compression?) but JSON is often better supported and easier to adopt piecemeal.

@ymodak ymodak self-assigned this Sep 27, 2018
@gautamvasudevan
Copy link
Collaborator

gautamvasudevan commented Oct 2, 2018

Per conversations - @wenbozhu is working on this.

@gautamvasudevan
Copy link
Collaborator

Fixed by b94f6c8

@orf
Copy link
Contributor Author

orf commented Oct 17, 2018

Thank you very much @wenbozhu!

@wenbozhu
Copy link
Contributor

@orf Reading your FR again, I am not sure why there is a JSON parsing error when gzip is not supported.

Also, is gzip useful for the "download" case?
.

@orf
Copy link
Contributor Author

orf commented Oct 23, 2018 via email

@wenbozhu
Copy link
Contributor

"
Describe the problem the feature is intended to solve

Currently tensorflow-serving does not handle GZipped request bodies in the REST endpoint, which can slow down POSTing large volumes of data to tensorflow-serving. It fails with a JSON parse error if this is the case, even if the correct headers are sent. ....
"

@orf
Copy link
Contributor Author

orf commented Oct 24, 2018

Oh right! Sorry. If you send a gzipped request body and the server does not decode it (due to it not supporting compressed requests) and blindly passes it into the JSON decoder it will likely fail as it's a bunch of nonsense bytes. That's where the error was coming from I guess.

This should not happen now that your PR has been merged though.

@wenbozhu
Copy link
Contributor

OK. Then is it important to support gzipped response bodies?

@orf
Copy link
Contributor Author

orf commented Oct 24, 2018

I'd say yes, it doesn't seem like it would take much more effort now that this is closed and it's an easy win. If you are sending many requests in a batch the responses can get quite large.

@shlomiken
Copy link

shlomiken commented Apr 7, 2019

Hi
can one give an example how to send such gzip request to predict from curl, as it does not appear in the documentation and i get 400 bad request when sending binary data with that json zipped.
for example
curl -v --data-binary @body.gz -H'Content-Encoding: gzip' -X POST http://localhost:8501/v1/models/pos-model:predict

on the server i get these errors
: Got zlib error: -3
[evhttp_request.cc : 199] RAW: Failed to uncompress the gzipped body
[evhttp_request.cc : 236] RAW: Got zlib error: -4
[evhttp_request.cc : 199] RAW: Failed to uncompress the gzipped body
[evhttp_request.cc : 236] RAW: Got zlib error: -4

@ttang235
Copy link

Hi
can one give an example how to send such gzip request to predict from curl, as it does not appear in the documentation and i get 400 bad request when sending binary data with that json zipped.
for example
curl -v --data-binary @body.gz -H'Content-Encoding: gzip' -X POST http://localhost:8501/v1/models/pos-model:predict

on the server i get these errors
: Got zlib error: -3
[evhttp_request.cc : 199] RAW: Failed to uncompress the gzipped body
[evhttp_request.cc : 236] RAW: Got zlib error: -4
[evhttp_request.cc : 199] RAW: Failed to uncompress the gzipped body
[evhttp_request.cc : 236] RAW: Got zlib error: -4

Maybe because too many instances in body.gz?
I also have this problem. Looks like if I send less than 100 instances in one request, it works; if I send 500 instances in one request, it would fail. The even more confusing thing is that the boundary isn't clear: sometimes the max number of instances that tf serving can handle is 200, and sometimes it's 188, etc, etc. It's not stable at all.... (If you're curious, I was using binary search to find the boundary)

Could anyone explain this?
I'm using tf serving 1.12.0

@ttang235
Copy link

I

Hi
can one give an example how to send such gzip request to predict from curl, as it does not appear in the documentation and i get 400 bad request when sending binary data with that json zipped.
for example
curl -v --data-binary @body.gz -H'Content-Encoding: gzip' -X POST http://localhost:8501/v1/models/pos-model:predict
on the server i get these errors
: Got zlib error: -3
[evhttp_request.cc : 199] RAW: Failed to uncompress the gzipped body
[evhttp_request.cc : 236] RAW: Got zlib error: -4
[evhttp_request.cc : 199] RAW: Failed to uncompress the gzipped body
[evhttp_request.cc : 236] RAW: Got zlib error: -4

Maybe because too many instances in body.gz?
I also have this problem. Looks like if I send less than 100 instances in one request, it works; if I send 500 instances in one request, it would fail. The even more confusing thing is that the boundary isn't clear: sometimes the max number of instances that tf serving can handle is 200, and sometimes it's 188, etc, etc. It's not stable at all.... (If you're curious, I was using binary search to find the boundary)

Could anyone explain this?
I'm using tf serving 1.12.0

Is the maximum size of unzipped body 10M? (based on kMaxUncompressedBytes in this code: b94f6c8)
In my case, the total size of 200 instances is less than 300 KB, so it shouldn't be a problem, but sometimes it fails. Why? Could you please help answer this? @wenbozhu

PS: I'm using tf serving 1.12.0 image in docker hub, in case it's related to this issue.

Thanks!

@wenbozhu
Copy link
Contributor

You mentioned both zlib error -3 and -4 .. and I suppose the first one is a typo?

-4 means the compressed data is corrupted or the server fails to allocate memory ... and is it possible that body.gz is wrong or curl will double compress if you specify C-E: gzip manually ...?

@shlomiken
Copy link

shlomiken commented Apr 12, 2019

You mentioned both zlib error -3 and -4 .. and I suppose the first one is a typo?

-4 means the compressed data is corrupted or the server fails to allocate memory ... and is it possible that body.gz is wrong or curl will double compress if you specify C-E: gzip manually ...?

This is not a type - i got both -3 and -4 (maybe on different calls , i now verified and mostly get -3)
The file was a successful to predict JSON file (26MB) - which i zipped using zip json.gz file.json

@wenbozhu
Copy link
Contributor

you should use gzip i.e. > gzip file.json

gzip adds gzip headers which include the size of uncompressed data. This is important as we want to make one memory allocation for all the uncompressed data.

@shlomiken
Copy link

shlomiken commented Apr 14, 2019

Hi @wenbozhu - thanks for helping out
i have zipped as you suggested with gzip file.json
i run this curl command
curl -v -s --trace-ascii http_trace.log --data-binary @file.json.gz -H "Content-Type: application/json" -H "Content-Encoding: gzip" -X POST http://localhost:8080/v1/models/pos-model:predict

and get this as http trace on curl side

== Info: We are completely uploaded and fine
<= Recv header, 26 bytes (0x1a)
0000: HTTP/1.1 400 Bad Request
<= Recv header, 32 bytes (0x20)
0000: Content-Type: application/json
<= Recv header, 37 bytes (0x25)
0000: Date: Sun, 14 Apr 2019 13:27:42 GMT
<= Recv header, 20 bytes (0x14)
0000: Content-Length: 54
<= Recv header, 2 bytes (0x2)
0000: 
<= Recv data, 54 bytes (0x36)
0000: { "error": "JSON Parse error: The document is empty" }

On Model server i see this error
evhttp_request.cc : 236] RAW: Got zlib error: -4 tenserve_app | [evhttp_request.cc : 199] RAW: Failed to uncompress the gzipped body

@wenbozhu
Copy link
Contributor

Thanks for the log, Will look into this

@ZhouyihaiDing
Copy link

I am trying to debug this but I am not able to reproduce it.

I was using the TF model example.
I increased the number of instances and used the curl command mentioned by shlomiken.
However, I could get the correct response.

@shlomiken
Copy link

I am trying to debug this but I am not able to reproduce it.

I was using the TF model example.
I increased the number of instances and used the curl command mentioned by shlomiken.
However, I could get the correct response.

Maybe a big file - mine is about 8.5 MB after gzip

@ZhouyihaiDing
Copy link

ZhouyihaiDing commented Apr 19, 2019

Thanks! Still looking at it.
FYI for 10MB limit of uncompressed size, I think it means the data size after the decompression.
Your data is already 8.5MB after gzip. It's very likely the original size is greater than 10MB, which will lead to a -4 error.

@shlomiken
Copy link

Thanks! Still looking at it.
FYI for 10MB limit of uncompressed size, I think it means the data size after the decompression.
Your data is already 8.5MB after gzip. It's very likely the original size is greater than 10MB, which will lead to a -4 error.

I don't understand - i sent 26MB (uncompressed) without any problem , the 8.5MB is compressed.
So is there or not a limit of 10MB - how i managed to send 26MB ?

@wenbozhu
Copy link
Contributor

The limit is for uncompressed data, enforced only when the response is gzipped.

The does cause inconsistent API semantics and we will fix it.

===

The uncompression is broken when the body is too large and we need fix this, to buffer the entire body or to enable streamed uncompression.

Re-open the bug to add the fix and test.

@wenbozhu wenbozhu reopened this Apr 26, 2019
@wenbozhu
Copy link
Contributor

wenbozhu commented Apr 26, 2019

Correction: the intent for the current behavior is to use RequestHandlerOptions::set_auto_uncompress_max_size() to limit uncompressed sizes (as a security limit).

If we are expecting responses larger than 10MB, then we need overwrite the default in the model server, e.g. to 100MB.

@netfs

===

The fix in httpserver is still needed.

@wenbozhu
Copy link
Contributor

wenbozhu commented May 3, 2019

This is fixed in upstream. The default max # of uncompressed bytes is now 100MB.

@wenbozhu wenbozhu closed this as completed May 3, 2019
@misterpeddy misterpeddy added the type:performance Performance Issue label Nov 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants