Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream TCP connection buffer and read buffer limits (#150). #571

Merged
merged 5 commits into from
Mar 16, 2017

Conversation

htuch
Copy link
Member

@htuch htuch commented Mar 15, 2017

As with fd58242, but on the upstream cluster side.

@htuch htuch force-pushed the upstream-read-buffer-limit branch from 99ac310 to d6d41c9 Compare March 15, 2017 00:57
@@ -37,6 +38,10 @@ connect_timeout_ms
*(required, integer)* The timeout for new network connections to hosts in the cluster specified
in milliseconds.

per_connection_buffer_limit_bytes
*(optional, integer)* Soft limit on size of the cluster's connections read and write buffers.
If unspecified, an implementation defined default is applied (1MB).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/MB/MiB (can you fix this on the listener side also)

@@ -264,6 +264,7 @@ ConnPoolImpl::ActiveClient::ActiveClient(ConnPoolImpl& parent)
parent_.conn_connect_ms_ =
parent_.host_->cluster().stats().upstream_cx_connect_ms_.allocateSpan();
Upstream::Host::CreateConnectionData data = parent_.host_->createConnection(parent_.dispatcher_);
data.connection_->setReadBufferLimit(parent_.host_->cluster().perConnectionBufferLimitBytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty fragile that every caller that calls createConnection() needs to know to then set the read buffer limit. Can we have the createConnection() function do it? The host has access to its cluster and cluster info.

@mattklein123
Copy link
Member

Sorry my approval is premature, I clicked the wrong button.

Network::ClientConnectionPtr connection =
cluster.sslContext() ? dispatcher.createSslClientConnection(*cluster.sslContext(), address)
: dispatcher.createClientConnection(address);
if (cluster_) {
Copy link
Member

@mattklein123 mattklein123 Mar 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall why this function is/was static, but you can use the passed in cluster, you don't need to use cluster_ (or check if nullptr, I don't think it can ever be nullptr).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's nicer, but there's a bug (probably just in tests?) that allows cluster_ to be null:

Program received signal SIGSEGV, Segmentation fault.
0x0000000001dfbbf8 in Upstream::HostImpl::createConnection (this=0x35cc8c0, dispatcher=..., cluster=..., address=...) at /source/source/common/upstream/upstream_impl.cc:36
36 connection->setReadBufferLimit(cluster.perConnectionBufferLimitBytes());
#0 0x0000000001dfbbf8 in Upstream::HostImpl::createConnection (this=0x35cc8c0, dispatcher=..., cluster=..., address=...) at /source/source/common/upstream/upstream_impl.cc:36
#1 0x0000000001dfba6d in Upstream::HostImpl::createConnection (this=0x35cc8c0, dispatcher=...) at /source/source/common/upstream/upstream_impl.cc:27
#2 0x0000000001dd490a in Upstream::HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onInterval (this=0x33f3800) at /source/source/common/upstream/health_checker_impl.cc:227
#3 0x0000000001dd455a in Upstream::HttpHealthCheckerImpl::HttpActiveHealthCheckSession::HttpActiveHealthCheckSession (this=0x33f3800, parent=..., host=...) at /source/source/common/upstream/health_checker_impl.cc:193
#4 0x0000000001dd43d7 in Upstream::HttpHealthCheckerImpl::start (this=0x34b3400) at /source/source/common/upstream/health_checker_impl.cc:186
#5 0x000000000186d2d6 in Upstream::HttpHealthCheckerImplTest_Success_Test::TestBody (this=0x35e6c00) at /source/test/common/upstream/health_checker_impl_test.cc:169
#6 0x000000000205bcd6 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) ()
#7 0x0000000002056a18 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) ()
#8 0x000000000203d965 in testing::Test::Run() ()
#9 0x000000000203e1f1 in testing::TestInfo::Run() ()
#10 0x000000000203e882 in testing::TestCase::Run() ()
#11 0x000000000204502d in testing::internal::UnitTestImpl::RunAllTests() ()
#12 0x000000000205cdd6 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) ()
#13 0x00000000020576da in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) ()
#14 0x0000000002043cb7 in testing::UnitTest::Run() ()
#15 0x00000000019845c4 in RUN_ALL_TESTS () at /thirdparty_build/include/gtest/gtest.h:2233
#16 0x0000000001983709 in main (argc=1, argv=0x7fffffffec18) at /source/test/main.cc:21
Starting epoch 0

Looking into it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I'm guessing that's just stupid test bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out it was the connection, not cluster that could be null, and that we should still test for.

Network::ClientConnectionPtr connection =
cluster.sslContext() ? dispatcher.createSslClientConnection(*cluster.sslContext(), address)
: dispatcher.createClientConnection(address);
if (connection) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC in the prod code this can't actually return nullptr, so I think this must be due to a test issue? Can we fix the test and avoid the if check here? (Please verify though).

@mattklein123
Copy link
Member

@htuch +1 looks good, but I think there are merge conflicts now.

@mattklein123 mattklein123 merged commit c44ee0d into envoyproxy:master Mar 16, 2017
@htuch htuch deleted the upstream-read-buffer-limit branch March 31, 2017 16:31
lambdai pushed a commit to lambdai/envoy-dai that referenced this pull request Jul 21, 2020
jpsim pushed a commit that referenced this pull request Nov 28, 2022
At the moment, we leak all `EnvoyHTTPStreamImpl` and `EnvoyHTTPCallbacks` instances with every stream due to the deliberate retain cycle used to keep the stream in memory and pass callbacks through to the platform layer while the stream is active. This ends up also leaking the callback closures specified by the end consumer, along with anything captured by those closures.

To resolve this problem, we will release the strong reference cycle that the stream holds to itself when the stream completes.

The state/callback for stream completion has to be stored somewhere in the `ios_context` type and retained since the `ios_context` is a struct and doesn't retain its members.

We considered having the `EnvoyHTTPCallbacks` hold a reference to the `EnvoyHTTPStreamImpl` rather than having `EnvoyHTTPStreamImpl` hold a strong reference to itself. However, this posed a few problems:
- `EnvoyHTTPCallbacks` is designed to support being shared by multiple streams, and thus cannot have only 1 reference to a stream
- `EnvoyHTTPCallbacks` is instantiated by the Swift layer, which means we'd have to leak implementation details of how the stream is kept in memory and shift logic out of the stream implementation

With these in mind, we decided to have `EnvoyHTTPStreamImpl` manage its own lifecycle based on the state of the network stream.

Signed-off-by: Michael Rebello <[email protected]>
Signed-off-by: JP Simard <[email protected]>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
At the moment, we leak all `EnvoyHTTPStreamImpl` and `EnvoyHTTPCallbacks` instances with every stream due to the deliberate retain cycle used to keep the stream in memory and pass callbacks through to the platform layer while the stream is active. This ends up also leaking the callback closures specified by the end consumer, along with anything captured by those closures.

To resolve this problem, we will release the strong reference cycle that the stream holds to itself when the stream completes.

The state/callback for stream completion has to be stored somewhere in the `ios_context` type and retained since the `ios_context` is a struct and doesn't retain its members.

We considered having the `EnvoyHTTPCallbacks` hold a reference to the `EnvoyHTTPStreamImpl` rather than having `EnvoyHTTPStreamImpl` hold a strong reference to itself. However, this posed a few problems:
- `EnvoyHTTPCallbacks` is designed to support being shared by multiple streams, and thus cannot have only 1 reference to a stream
- `EnvoyHTTPCallbacks` is instantiated by the Swift layer, which means we'd have to leak implementation details of how the stream is kept in memory and shift logic out of the stream implementation

With these in mind, we decided to have `EnvoyHTTPStreamImpl` manage its own lifecycle based on the state of the network stream.

Signed-off-by: Michael Rebello <[email protected]>
Signed-off-by: JP Simard <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants