Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak with invalid status code #316

Open
Jorgevillada opened this issue Oct 31, 2021 · 4 comments
Open

Memory Leak with invalid status code #316

Jorgevillada opened this issue Oct 31, 2021 · 4 comments

Comments

@Jorgevillada
Copy link

Hi, thank you for this project.

I been testing this project in kubernetes with 100/200 Req/sec and with invalid status code (> 299) there are some problems with the memory and a lot of OOMKilled in a day(10-15).

I tested this PR updated with the current version of the branch main(#235) but it has other errors. (Probably channels closed)

error copying response: readfrom tcp 127.0.0.1:8080->127.0.0.1:53150: write tcp 127.0.0.1:8080->127.0.0.1:53150: write: broken pipe

This image is from prometheus with the current branch main.
Memory incrase until OOMKilled happens
image

I been trying some code modifications and this behavior changes when added this If (after this if https://github.com/willnorris/imageproxy/blob/main/imageproxy.go#L484-L487)

if should304(req, resp) {
		// bare 304 response, full response will be used from cache
		return &http.Response{StatusCode: http.StatusNotModified}, nil
}
if resp.StatusCode > 299 {
    return nil, fmt.Errorf("invalid http code %s: %s", req.URL.String(), resp.Status)
}

and this. https://github.com/willnorris/imageproxy/blob/main/imageproxy.go#L207
i moved it before this if.

resp, err := p.Client.Do(actualReq)
// close the original resp.Body, even if we wrap it in a NopCloser below
if resp != nil && resp.Body != nil {
  defer resp.Body.Close()
}

if err != nil {
  msg := fmt.Sprintf("error fetching remote image: %v", err)
  p.log(msg)
  http.Error(w, msg, http.StatusInternalServerError)
  metricRemoteErrors.Inc()
  return
}

I'm no sure if this status code validation works in all escenarios, but for me it works.
This is prometheus with this change(1 hour before vs 9 hour now)
image

The memory problem is still present, but the behavior is different.

I added "net/http/pprof" to try to find aditional strange behaviors.
and I found this.
image

It seems like a problem with cached request too.

I found this open PR https://github.com/gregjones/httpcache/pull/102/files

This is from pprof:
Heap: https://drive.google.com/file/d/1lH9oYfO_88EAAMdxEwZV1FrcNS-3CaN-/view?usp=sharing
goroutine: https://drive.google.com/file/d/1_jinUorbPVNLxEMrUbzXs5Z9ITT5_2MD/view?usp=sharing

Relates to: #97 #196 #92

@gardner
Copy link

gardner commented Aug 1, 2022

nginx can rate limit request:

limit_req_zone global zone=img:1m rate=2r/s;
server {
    listen       80;
    server_name example.com;

    location ^~ /img/ {
        limit_req zone=img burst=20;
        proxy_pass http://img:8080/;
    }

@netik
Copy link

netik commented Sep 9, 2022

Rate limiting isn't the answer when there are actual memory leaks to contend with.

I'm testing these patches on my install now.

@gardner
Copy link

gardner commented Sep 12, 2022

This issue ultimately prevented me from using the software. I ended up converting everything to webp in a preprocessing batch job.

@LoopControl
Copy link

I'm running into this issue now with increased traffic -- every request is getting timed out with messages like:

error copying response: readfrom tcp 127.0.0.1:8080->127.0.0.1:39936: write tcp 127.0.0.1:8080->127.0.0.1:39936: write: broken pipe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants