proposal: reflect: add mechanisms to allocate large slices only if possible #33060

ianlancetaylor · 2019-07-11T17:02:24Z

We've seen several discussions about how to handle running out of memory in a Go program. In a quick look I found #5049 (closed), #14162 (closed), #16843 (open).

One way to think about memory allocations in a program is to observe that there are fixed-size allocations (new(int), or simply letting the address of a variable escape) and there are variable-size allocations (make([]int, N), append(a, b...)). If your program does too many fixed-size allocations, then you have a garbage collection problem; how to handle GC backpressure is discussed in the issues cited above. However, many server programs need to read variable-length data from a client, and they don't know how much space they will need when a request arrives. It seems useful to permit those programs to have some way of checking, as they allocate memory, whether that memory is available.

Any approach that relies on asking the garbage collector how much memory is available is prone to race conditions in a server, and relies on details of garbage collection. I propose a simpler mechanism:

package reflect

// TryMakeSlice creates a new zero-initialized slice value
// for the specified slice type, length, and capacity.
// If the memory is not available, it returns a zero Value.
func TryMakeSlice(typ Type, len, cap int) Value

// TryAppend appends the values x to a slice s and returns the resulting slice.
// As in Go, each x's value must be assignable to the slice's element type.
// If the memory is not available, it returns a zero Value.
func TryAppend(s Value, x ...Value) Value

(I note in passing that if we had generics, we could write these functions in a more natural way and put them in the runtime package. But the underlying idea remains the same.)

These functions in the reflect package are not the easiest to use, but they will permit servers to handle certain common cases of memory overload without simply crashing. We should set the criteria under which they would fail. For example, perhaps they should fail if after allocation less than 5% of available memory would remain unallocated.

We could then use these functions in packages like encoding/json and encoding/gob, so that those packages could reliably fail with an out of memory error when decoding a large incoming stream.

The text was updated successfully, but these errors were encountered:

networkimprov · 2019-07-11T21:26:59Z

Can anything be done to help existing programs fail gracefully?

I imagine the proposed API would alter best practices for long-running programs to: "Always use reflect.Try*() instead of make() and append()."

Since the goal is to keep allocated memory < N% of available mem, have you considered a runtime API which makes normal allocations optionally return nil or panic with "almost-OOM" (which is recoverable) if an allocation would exceed the limit?

ianlancetaylor · 2019-07-11T22:07:24Z

If we change encoding/gob and encoding/json, that will automatically help a number of existing programs: they will see and report an error rather than running out of memory. A server will report an error back to the client, and continue serving other clients rather than crashing. There are likely a few other packages that could be changed similarly, such as, perhaps, archive/zip.

I don't think it's necessary or appropriate to always use the new functions. It's appropriate to use the new functions in a server serving multiple concurrent clients where the size of the allocated memory is controlled by the client. When memory usage is not controlled by an external client, other mechanisms to limit memory use are almost certainly more appropriate.

I don't think we can reasonably change existing allocations (by which I assume you mean the pre-declared make and append functions). Existing programs do not expect those functions to fail or panic. Changing the functions would likely cause the existing programs either to fail or to crash. Since the current behavior is to crash anyhow, nothing would be gained by the change.

ianlancetaylor · 2019-07-11T22:09:19Z

Let me add something:

Since the goal is to keep allocated memory < N% of available mem

I want to clarify that although that may be a reasonable goal, it's not the goal of these proposed functions. The goal of these proposed functions is to permit a program to avoid some crash cases when a client makes a request that requires a lot of memory to serve. We don't currently have any way to ask whether an allocation can succeed. This proposal adds that capability.

bcmills · 2019-07-12T17:24:38Z

You suggest a 5% threshold of remaining free memory for failure, but it seems to me that the appropriate threshold in a real server depends very strongly on the implementation details of that server.

If the caller already knows that the allocation is large enough that it may fail, it seems simpler for them to apply a (weighted) semaphore before the allocation rather than a check after it — and that makes the overall memory fraction much easier to tailor to their specific server. What advantages does the post-hoc approach provide over the explicit semaphore?

ianlancetaylor · 2019-07-12T17:41:22Z

The advantage of the post-hoc approach is that it can be added to existing packages like encoding/json without complicating their ABI. A server that takes json-encoded data from a client has no way of knowing how much memory will be required to represent that information.

Another approach would be to invent a new interchange format that is easier to control, but JSON is what we have today.

We can always add more knobs, either to encoding/json or to the entirely arbitrary 5% threshold, but I think it's worth thinking about what we can do without knobs. Especially since even if we had knobs most programs wouldn't use them.

bcmills · 2019-07-12T18:22:28Z

It seems to me that this would allow binary maintainers to paper over flow-control bugs under moderate load in much the same way that increasing channel buffers allows maintainers to paper over concurrency bugs, and under similar conditions: it's an easy hammer to make the symptom go away without actually understanding or addressing the underlying problem.

In particular, it's easy to use such a tool to “fix” flakiness in a simplified load-test without actually fixing the production risk that the test is intended to detect: it would produce recoverable errors for large overallocations (which are easy to provoke in a synthetic test), but still result in crashes for more modest ones (which are more likely to occur under organic growth).

rsc · 2019-07-16T20:24:21Z

Reflect seems pretty clunky; what about x, ok = make([]int, N) or x, ok = append(x, y...)?

rsc · 2019-07-16T20:26:59Z

(Also how do we know for sure whether a particular allocation should fail as out of memory? Overcommit means sometimes you think it's going to work and you get killed anyway.)

RLH · 2019-07-18T14:45:35Z

The amount of memory available for allocation is a global property and this proposal tries to deal with this using local information. The proposal comes with potential races between various allocations to see which one succeeds, which one fails using this proposal’s functions, and which fails with an OOM. Using these functions will increase technical debt as applications solve localized coding problems instead of taking a more global approach such as envisioned by SetMaxHeap.

Pragmatically, it would be difficult to create a reliable replicator, decrease the robustness of tests, and increase the maintenance load on the core runtime team responding to error reports. Replicator creation will be further complicated by the fact that the proposal is sensitive to co-tenancy problems so global is even “more global” (sorry) than reachability. Finally, any allocation not using this API can result in an OOM so it would not be possible for a high level package API, say the JSON API, to claim that it will return an error instead of an OOM.

The SetMaxHeap alternative should be allowed to mature and be used for several releases to see if it can address some of the problems that motivated this proposal. If a client can force a server to allocate unbounded amounts of memory then perhaps the server can be architected to use a series of allocation with bounded size along with SetMaxHeap to fail gracefully.

ianlancetaylor · 2019-08-09T17:25:36Z

If a client can force a server to allocate unbounded amounts of memory then perhaps the server can be architected to use a series of allocation with bounded size along with SetMaxHeap to fail gracefully.

I think this is the key point. Does SetMaxHeap permit such a server to fail gracefully?

bcmills · 2019-08-09T21:04:23Z

In the context of “encoding/json without adding knobs”, I don't see how SetMaxHeap would help: the API would need some way to allow the caller to cancel a call to Decode or Unmarshal in response to heap growth, and plumbing in any explicit cancellation mechanism would require an API change.

RLH · 2019-08-10T16:03:26Z

For SetMaxHeap to operate as intended it requires the application to be able to throttle its workload in order to smooth out spikes in memory demand, possibly by failing gracefully. SetMaxHeap simply monitors global heap usage and when it exceeds a threshold it will inform the application. The throttling is up to the application. A clever programmer could use SetMaxHeap to implement TryMakeSplice and TryAppend. TryMakeSplice/Append would know MaxHeap, GOGC, and whether any throttling had been requested. From there it would know how much memory is likely available, with a default GOGC it would be half of MaxHeap. This should provide a good indication of whether the allocation will succeed. This is just one way to implement throttling. If one can’t implement a reasonable TryMakeSplice/Append using SetMaxHeap then it might indicate a deficiency in the SetMaxHeap API.

…

On Fri, Aug 9, 2019 at 5:04 PM Bryan C. Mills ***@***.***> wrote: In the context of “encoding/json without adding knobs”, I don't see how SetMaxHeap would help: the API would need some way to allow the caller to cancel a call to Decode or Unmarshal in response to heap growth, and plumbing in any explicit cancellation mechanism would require an API change. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#33060?email_source=notifications&email_token=AAHNNH5MUHCEUE4LMBLM7SDQDXLXBA5CNFSM4IBOFB2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD37YQHY#issuecomment-520063007>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHNNH6NY2LGIBR5M7ZHCZLQDXLXBANCNFSM4IBOFB2A> .

rsc · 2019-12-11T18:42:41Z

There is a larger problem here than just allocating large slices. The difficult part is the "only if possible". It's not clear what the conditions are under which an allocation should fail. What if it leaves 1 byte of memory, or even 1MB of memory, and the next new([2<<20]byte) crashes instead?

C lets malloc fail and many people do check it.
C++ throws an exception when new fails.
Java throws an exception too.
But I bet most C++ and Java programs do not handle those exceptions well.

We have a hard-coded limit in bytes.Buffer.
We also have a hard-coded limit in encoding/gob, in part to protect against malicious gobs that say "here comes a billion entries" and then cut off.
There's no prefixed count in encoding/json or encoding/xml that would cause an arbitrarily large allocation from an arbitrarily small input.

For streaming outputs like encoding/gzip we have io.LimitWriter.
There's no equivalent for data structure unmarshalers.
At least json and xml are generating data proportional to the input size.

This specific proposal of reflect handling just slice allocation is probably not enough to solve this larger problem. It's unclear how to move forward here.

josharian · 2019-12-11T18:49:17Z

We also have a hard-coded limit in encoding/gob, in part to protect against malicious gobs that say "here comes a billion entries" and then cut off.

IIRC, the protections for encoding/gob are incomplete. See #20221.

ianlancetaylor · 2019-12-13T03:07:18Z

Related issues: #5050 #20169.

rsc · 2020-01-08T18:28:05Z

It's still unclear how to solve the underlying problem - give Go programs that care some way to limit allocation in code that is cooperating with the limits. Perhaps some kind of memory allocation context threaded through code (a little like an arena) would work, but that's getting pretty far from this specific proposal.

Perhaps @ianlancetaylor will write more about an allocation-context-with-max in a different proposal. The difference is that the explicit context would be for a group of allocations, not one, and users would be in control of the limit.

It seems like this one is a likely decline.

Leaving open for a week for final comments.

networkimprov · 2020-01-08T19:44:15Z

Regarding a limit for a group of allocations, here's that idea for channel buffers: #20868

It would be great if an "arena" proposal encompassed that use case.

rsc · 2020-01-15T18:04:06Z

No change in consensus; closing.

gopherbot added this to the Proposal milestone Jul 11, 2019

gopherbot added the Proposal label Jul 11, 2019

rsc mentioned this issue Dec 11, 2019

proposal: review meeting minutes #33502

Open

rsc closed this as completed Jan 15, 2020

ianlancetaylor mentioned this issue Feb 25, 2020

proposal: runtime: mechanism for monitoring system memory #37452

Closed

golang locked and limited conversation to collaborators Jan 14, 2021

gopherbot added the FrozenDueToAge label Jan 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: reflect: add mechanisms to allocate large slices only if possible #33060

proposal: reflect: add mechanisms to allocate large slices only if possible #33060

ianlancetaylor commented Jul 11, 2019

networkimprov commented Jul 11, 2019 •

edited

Loading

ianlancetaylor commented Jul 11, 2019

ianlancetaylor commented Jul 11, 2019 •

edited

Loading

bcmills commented Jul 12, 2019 •

edited

Loading

ianlancetaylor commented Jul 12, 2019

bcmills commented Jul 12, 2019 •

edited

Loading

rsc commented Jul 16, 2019

rsc commented Jul 16, 2019

RLH commented Jul 18, 2019

ianlancetaylor commented Aug 9, 2019

bcmills commented Aug 9, 2019

RLH commented Aug 10, 2019 via email

rsc commented Dec 11, 2019

josharian commented Dec 11, 2019

ianlancetaylor commented Dec 13, 2019

rsc commented Jan 8, 2020

networkimprov commented Jan 8, 2020

rsc commented Jan 15, 2020

proposal: reflect: add mechanisms to allocate large slices only if possible #33060

proposal: reflect: add mechanisms to allocate large slices only if possible #33060

Comments

ianlancetaylor commented Jul 11, 2019

networkimprov commented Jul 11, 2019 • edited Loading

ianlancetaylor commented Jul 11, 2019

ianlancetaylor commented Jul 11, 2019 • edited Loading

bcmills commented Jul 12, 2019 • edited Loading

ianlancetaylor commented Jul 12, 2019

bcmills commented Jul 12, 2019 • edited Loading

rsc commented Jul 16, 2019

rsc commented Jul 16, 2019

RLH commented Jul 18, 2019

ianlancetaylor commented Aug 9, 2019

bcmills commented Aug 9, 2019

RLH commented Aug 10, 2019 via email

rsc commented Dec 11, 2019

josharian commented Dec 11, 2019

ianlancetaylor commented Dec 13, 2019

rsc commented Jan 8, 2020

networkimprov commented Jan 8, 2020

rsc commented Jan 15, 2020

networkimprov commented Jul 11, 2019 •

edited

Loading

ianlancetaylor commented Jul 11, 2019 •

edited

Loading

bcmills commented Jul 12, 2019 •

edited

Loading

bcmills commented Jul 12, 2019 •

edited

Loading