http_listener: pre-allocate buffer for reading request body #1892

supershal · 2016-10-12T18:16:11Z

Required for all PRs:

CHANGELOG.md updated (we recommend not updating this until the PR has been approved by a maintainer)
Sign CLA (if not already signed)
README.md updated (if adding a new plugin)

Fixes: #1823 and #1856
We are using http_listener as a relay to route traffic between two influxdb instances in our HA setup. We noticed memory leaks when sending avg 800 metrics per batch from around 500 instances running telegraf. The bugfix #1826 caused issue with metrics parsing #1856.

This fix handles the incoming requests the same way as influxdb's HTTP write handler. It reads content-length header and preallocates buffer with that size and reads all data before parsing them using scanner. We have done some stress tests on the fix and running the fixed version since two days without any issue. The telegraf instances does not seems to leak memory and memory usage remains low.

Fixes: influxdata#1823 and influxdata#1856

sparrc · 2016-10-12T18:42:27Z

@supershal do you happen to have any steps to reproduce this? I haven't been able to under a very high load using our influx stress tool (>300,000 points/s)

I think that this is going to have the same issue that #1826 was intending to solve, which is that we are basically doing an ioutil.ReadAll with the request body, from what I can tell of this change.

we will need to test with very large request bodies, larger than 100,000 points.

supershal · 2016-10-12T23:46:53Z

@sparrc I notice higher memory usage when http_listener timeouts while reading the request body.
With old code (with ioutil.ReadAll), I was able to reproduce issue with our staging environment by reducing http_listener's config "read_timeout" to less than 10s and 100 http requests/second with 1000 points each. We are using c3.large instances in our staging environment.

With fix above, when I used the same timeout for the same traffic, I do see a lot of timeout errors in log, but memory is not going up.

I was not able to reproduce it on my local test. I suspect GC is collecting memory faster on my local box. I have to create similar configuration as c3.large (2 core) in order to consistently reproduce it. I will try to create comprehensive stress test tonight.

I think the difference between ioutil.ReadAll and above fix is the pre-allocating buffer capacity so the buffer byte slice does not have to expand frequently while reading bytes from request body. I think that could be reason for timeouts.
ioutil.ReadAll uses 512 bytes initially. https://golang.org/src/io/ioutil/ioutil.go?s=1147:1188#L31
Buffer.ReadFrom expands it once it reach it capacity. https://golang.org/src/bytes/buffer.go?s=5609:5668#L147
If we have enough capacity allocated in advance for the buffer, rapid expansion of buffer and copy of the contents would not trigger frequently.
However I dont know how GC is claiming back memory in both case. I did not noticed memory going up with above fix and introducing timeouts.

sparrc · 2016-10-13T08:30:55Z

it makes some sense that pre-allocating the buffer rather than expanding it starting with a 512-byte buffer would perform better, though I'm not sure I understand why the memory usage is so much lower either.

bytes.Buffer also has an incremental reading function Buffer.ReadBytes(delim) that could be used to read & parse the buffer line-by-line.

We may want to look into using that as well. I know that in InfluxDB it's much more efficient to batch and write all the points together, and this might be why they didn't do it incrementally.

sparrc · 2016-10-13T15:43:11Z

@supershal I've been experiencing better performance and more consistent writes using your changes 👍 . I've made a few adjustments, mainly:

limit the size of the request using http.MaxBytesReader
use the http.Request object's ContentLength value rather than parsing the header using strconv
continue through the function if there is a failure in the parser.Parse function, because it's possible for this function to fail but still have parsed some metrics correctly.

supershal · 2016-10-13T17:31:38Z

@sparrc Thanks for making enhancements to the fix.

supershal · 2016-10-15T00:09:47Z

@sparrc These changes helps a lot to read http body faster.
However request processing through telegraf is not fast enought to use http_listener + telegraf as relay between telegrafs and influxdb.
I quickly hacked up this brute force program to test standalone influxdb and http_listener + telegraf. https://gist.github.com/supershal/706adee2062817d0476df18af4b1e166
It generates similar load as our production environment.
I tested it directly against local influxdb Vs http_listerner -> telegraf -> file, influxdb.
Direct influxdb inserts for the same traffic takes on avg 500 ms while http_listener + telegraf takes more than 10s. This causes a lot of timeouts and http requests are stacked up.

While going through code, I noticed that metrics get flushed to output writer when metricbatchsize is reached while adding a point to each output buffer.
https://github.com/influxdata/telegraf/blob/master/internal/models/running_output.go#L76
It gets worse when we have multiple output plugins since points added to each output serially. https://github.com/influxdata/telegraf/blob/master/agent/agent.go#L280
I think this is holding the channel in Accumulator and eventually the input plugin.

I will play around with couple of fixes to see if I see as similar performance as against single influxdb. Let me know what are your thoughts and suggestion to improve the performance.

sparrc · 2016-10-15T13:46:42Z

@supershal I've made some changes to your original PR,

I don't think that reading the entire http body as fast as possible is the absolute best solution, because in some instances the body might be very large and we don't want to create very large allocations. For that reason I've changed it to read the body byte-by-byte up to a maximum 10MB buffer, then flush each time that fills.

Any perf improvement PRs are appreciated but I wouldn't recommend relying on Telegraf as a high-throughput relay to InfluxDB and I wouldn't recommend that you implement this setup in your environment, regardless of the performance improvements we can make.

From what you've said so far I'm fairly certain that if you simply use the flush_jitter config option in telegraf, then you will have no problems writing directly to influxdb. This is exactly the use-case that config option was designed for.

if the content-length of the http request is smaller than the maximum size, then we will read the entire body at once. if the content-length of the http request is larger than the maximum size, then we will read the request body in chunks no larger than 20MB at a time. This is to prevent the http handler creating huge allocations and potentially OOMing if a user posts a large file of metrics to the /write endpoint.

joelegasse

I think this change is unnecessarily complex.

The "chunking" behavior can be done with a simple []byte, io.ReadFull, and bytes.LastIndexByte rather than the convoluted usage of bufio.Reader, bytes.Buffer, reading one byte at a time, guessing when we might be filling the buffer, and arbitrarily modifying the input data to attempt to parse it...

The simplified approach of grabbing a timestamp ahead of time would also allow eliminate the difference between "chunked" from "read full" behavior. Although the MaxBytesReader should be added after the gzip reader, to prevent a seemingly small payload from causing problems as it gets processed.

joelegasse · 2016-10-17T13:20:16Z

plugins/inputs/http_listener/http_listener.go

+	// DEFAULT_REQUEST_BODY_MAX is the default maximum request body size, in bytes.
+	// if the request body is over this size, we will return an HTTP 413 error.
+	// 1 GB
+	DEFAULT_REQUEST_BODY_MAX = 1 * 1000 * 1000 * 1000


This might be a little high as a default...

joelegasse · 2016-10-17T13:22:49Z

plugins/inputs/http_listener/http_listener.go

-					t.acc.AddFields(m.Name(), m.Fields(), m.Tags(), m.Time())
-				}
-				partial = "partial write: "
+		var msg413 bytes.Buffer


I would suggest breaking these out as separate handlers, rather than embedded in the switch statement.

joelegasse · 2016-10-17T13:33:44Z

plugins/inputs/http_listener/http_listener.go

 		} else {
-			res.WriteHeader(http.StatusNoContent)
+			body = http.MaxBytesReader(res, req.Body, t.MaxBodySize)


MaxBytesReader Should be set before the gzip check, so that this line isn't duplicated.

joelegasse · 2016-10-17T13:36:10Z

plugins/inputs/http_listener/http_listener.go

+		var body io.ReadCloser
+		var err error
+		if req.Header.Get("Content-Encoding") == "gzip" {
+			body, err = gzip.NewReader(http.MaxBytesReader(res, req.Body, t.MaxBodySize))


The gzip reader (and not the underlying connection) should be closed, probably with a defer.

joelegasse · 2016-10-17T13:37:56Z

plugins/inputs/http_listener/http_listener.go

+		var msg413 bytes.Buffer
+		var msg400 bytes.Buffer
+		defer func() {
+			if msg413.Len() > 0 {


This should probably be a utility function, and not logic embedded within a defer checking the state of a shared variable...

joelegasse · 2016-10-17T14:31:54Z

plugins/inputs/http_listener/http_listener.go

+			for {
+				b, err := reader.ReadByte()
+				if err != nil {
+					if err != io.EOF {


This multi-nested conditional should be restructured to prevent the deep nesting. First example: if err == io.EOF { break }. The != case has an unconditional return at the end of the block, meaning the only way the break on line 217 gets called is if err == io.EOF

joelegasse · 2016-10-17T14:36:46Z

plugins/parsers/influx/parser.go

@@ -24,6 +24,14 @@ func (p *InfluxParser) Parse(buf []byte) ([]telegraf.Metric, error) {
 	// parse even if the buffer begins with a newline
 	buf = bytes.TrimPrefix(buf, []byte("\n"))


Not related to this PR, but could just as well be bytes.TrimLeftFunc(buf, unicode.IsSpace)

joelegasse · 2016-10-17T14:40:52Z

plugins/parsers/influx/parser.go

+	if err != nil {
+		if len(err.Error()) > 1024 {
+			err = fmt.Errorf("Error parsing influx line-protocol (error truncated): %s",
+				err.Error()[0:256])


Extra 0, can just be [:256]

joelegasse · 2016-10-17T14:43:53Z

plugins/inputs/http_listener/http_listener.go

 		http.NotFound(res, req)
 	}
 }

+func (t *HttpListener) parse(b []byte, errmsg *bytes.Buffer) {


I don't like the idea of passing in a buffer, just to modify its state (which then gets checked in a deferred function later...)

I would suggest returning an error and changing the order of the statements of this function: Iterate over the metrics returned first since that is unconditional, then conditionally return a (potentially modified) error.

joelegasse · 2016-10-17T15:01:54Z

plugins/inputs/http_listener/http_listener.go

+				// if we have a newline and we're nearing the end of the buffer,
+				// do a write and continue with a fresh buffer.
+				if buffer.Len() > MAX_ALLOCATION_SIZE-256*1000 && b == '\n' {
+					t.parse(buffer.Bytes(), &msg400)


This chunked processing differs in behavior from the standard influxdb handler in that these "chunks" will end up with separate timestamps, if they weren't set on the original points. If there's going to be chunked processing, the current time should be grabbed at the beginning of the processing, then passed around as the default time to models.ParsePointsWithPrecision. I see that models.ParsePoints is used in the parser, which just calls models.ParsePointsWithPrecision(buf, time.Now(), "n"). Capturing the timestamp earlier and replacing all uses of ParsePoints(buf) with ParsePointsWithPrecision(buf, ts, "n") would keep similar behavior to the database handler.

sparrc · 2016-10-24T15:28:23Z

thank you for the contribution @supershal, I'm going to close this in favor of #1915

Our hope is that #1915 will perform better in the long run than reading the request body all at once. It's possible that in some situations it may seem as if the reading of the request body is timing out more, because it does iteratively read the body.

That being said, it should allocate significantly less memory in regular average usage, and should also put substantially less pressure on GC.

http_listener: pre-allocate buffer for reading request body

4c670f7

Fixes: influxdata#1823 and influxdata#1856

supershal mentioned this pull request Oct 12, 2016

http_listener leaks memory #1823

Closed

sparrc added this to the 1.1.0 milestone Oct 13, 2016

Fixup http write handler changes

c73964c

sparrc force-pushed the supershal_1823 branch from 1fb42ff to c73964c Compare October 13, 2016 17:48

sparrc approved these changes Oct 13, 2016

View reviewed changes

sparrc force-pushed the supershal_1823 branch 15 times, most recently from bd0af9a to 9ced250 Compare October 14, 2016 23:00

sparrc force-pushed the supershal_1823 branch 2 times, most recently from ce24c8f to 2504b97 Compare October 15, 2016 17:46

sparrc force-pushed the supershal_1823 branch 8 times, most recently from 8760187 to 71aa0d5 Compare October 17, 2016 12:14

sparrc force-pushed the supershal_1823 branch from 71aa0d5 to dee15aa Compare October 17, 2016 15:45

joelegasse suggested changes Oct 17, 2016

View reviewed changes

sparrc closed this Oct 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http_listener: pre-allocate buffer for reading request body #1892

http_listener: pre-allocate buffer for reading request body #1892

supershal commented Oct 12, 2016

sparrc commented Oct 12, 2016

supershal commented Oct 12, 2016 •

edited

Loading

sparrc commented Oct 13, 2016

sparrc commented Oct 13, 2016 •

edited

Loading

supershal commented Oct 13, 2016

supershal commented Oct 15, 2016

sparrc commented Oct 15, 2016 •

edited

Loading

joelegasse left a comment

joelegasse Oct 17, 2016

joelegasse Oct 17, 2016

joelegasse Oct 17, 2016

joelegasse Oct 17, 2016

joelegasse Oct 17, 2016

joelegasse Oct 17, 2016

joelegasse Oct 17, 2016

joelegasse Oct 17, 2016

joelegasse Oct 17, 2016

joelegasse Oct 17, 2016

joelegasse Oct 17, 2016

sparrc commented Oct 24, 2016

		@@ -24,6 +24,14 @@ func (p *InfluxParser) Parse(buf []byte) ([]telegraf.Metric, error) {
		// parse even if the buffer begins with a newline
		buf = bytes.TrimPrefix(buf, []byte("\n"))

http_listener: pre-allocate buffer for reading request body #1892

http_listener: pre-allocate buffer for reading request body #1892

Conversation

supershal commented Oct 12, 2016

Required for all PRs:

sparrc commented Oct 12, 2016

supershal commented Oct 12, 2016 • edited Loading

sparrc commented Oct 13, 2016

sparrc commented Oct 13, 2016 • edited Loading

supershal commented Oct 13, 2016

supershal commented Oct 15, 2016

sparrc commented Oct 15, 2016 • edited Loading

joelegasse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sparrc commented Oct 24, 2016

supershal commented Oct 12, 2016 •

edited

Loading

sparrc commented Oct 13, 2016 •

edited

Loading

sparrc commented Oct 15, 2016 •

edited

Loading