Pipelining causes NOAUTH errors with External Authentication on Envoy #2785

dbarbosapn · 2024-09-03T17:51:08Z

Hello,

We have a situation where our Redis client is returning NOAUTH during the connection flow, but if we manually connect to it through redis-cli and send an AUTH <password> command, from that exact same machine, we get OK and can send other commands after.

There is no place where we can see the actual result of the AUTH command sent by the SDK since, from what we've seen in code, it's a fire and forget message.
Any ideas how to debug this?

Important Note
We have multiple redis clusters that we're connecting to. If we put this cluster first, it works. If we put it last, it doesn't!

For example, this doesn't work:

ConnectionMultiplexer.Connect(options1);
...
ConnectionMultiplexer.Connect(options2);
...
ConnectionMultiplexer.Connect(options3);
...
ConnectionMultiplexer.Connect(options4);

But this does:

ConnectionMultiplexer.Connect(options4);
...
ConnectionMultiplexer.Connect(options2);
...
ConnectionMultiplexer.Connect(options3);
...
ConnectionMultiplexer.Connect(options1);

We're wondering if somehow the multiplexer might be using a wrong connection from a completely different instance. If there is some kind of static or thread-local state.

Some logs:

Connecting (sync) on .NET 8.0.8 (StackExchange.Redis: v2.8.0.27420)
<redacted>.internal.office.com:6380,syncTimeout=5000,connectTimeout=17000,abortConnect=False
<redacted>.internal.office.com:6380/Interactive: Connecting...
<redacted>.internal.office.com:6380: BeginConnectAsync
1 unique nodes specified (with tiebreaker)
<redacted>.internal.office.com:6380: OnConnectedAsync init (State=Connecting)
Allowing 1 endpoint(s) 00:00:17 to respond...
Awaiting 1 available task completion(s) for 17000ms, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=0,Free=32767,Min=4,Max=32767), POOL: (Threads=6,QueuedItems=0,CompletedItems=404,Timers=28)
Configuring TLS
TLS connection established successfully using protocol: Tls13
<redacted>.internal.office.com:6380/Interactive: Connected
<redacted>.internal.office.com:6380: Server handshake
<redacted>.internal.office.com:6380: Authenticating (password)
<redacted>.internal.office.com:6380: Sending critical tracer (handshake): PING
<redacted>.internal.office.com:6380: Flushing outbound buffer
<redacted>.internal.office.com:6380: OnEstablishingAsync complete
<redacted>.internal.office.com:6380: Starting read
Response from <redacted>.internal.office.com:6380/Interactive / PING: Error: NOAUTH Authentication required.
<redacted>.internal.office.com:6380: OnConnectedAsync completed (Disconnected)
All 1 available tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=2,Free=32765,Min=4,Max=32767), POOL: (Threads=6,QueuedItems=0,CompletedItems=422,Timers=27)
Endpoint summary:
<redacted>.internal.office.com:6380: Endpoint is (Interactive: Connecting, Subscription: Disconnected)
Task summary:
<redacted>.internal.office.com:6380: Returned, but incorrectly
Endpoint Summary:
<redacted>.internal.office.com:6380: Envoyproxy v6.0.0, primary; 1 databases; keep-alive: 00:01:00; int: Connecting; sub: n/a; not in use: DidNotRespond
<redacted>.internal.office.com:6380: int ops=2, qu=0, qs=0, qc=0, wr=0, socks=2; sub ops=0, qu=0, qs=0, qc=0, wr=0
<redacted>.internal.office.com:6380: Circular op-count snapshot; int: 0+2=2 (0.20 ops/s; spans 10s); sub:
Sync timeouts: 0; async timeouts: 0; fire and forget: 0; last heartbeat: -1s ago
Resetting failing connections to retry...
AuthenticationFailure on <redacted>.internal.office.com:6380/Interactive, Flushed/ComputeResult, last: PING, origin: SetResult, outstanding: 0, last-read: 0s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablishing, mgr: 9 of 10 available, last-heartbeat: never, last-mbeat: 0s ago, global: 0s ago, v: 2.8.0.27420

The text was updated successfully, but these errors were encountered:

slorello89 · 2024-09-03T18:58:40Z

Are you putting your password in the ConfigurationOptions? because it should call the AUTH automatically.

dbarbosapn · 2024-09-03T19:02:55Z

Yes, we are.
You can even see on the logs it shows "Authenticating (password)".

This is really weird

slorello89 · 2024-09-03T19:10:41Z

Maybe try giving each of your ConfigurationOptions a unique SocketManager? That is a shared resource unless you overwrite it (though I wouldn't think it would matter because presumably all of your clusters are at different endpoints).

dbarbosapn · 2024-09-03T19:12:53Z

Yes all clusters are different endpoints...
Also maybe important information is that we see the AUTH call incoming on server-side, and we know it is authenticated successfully. Makes no sense how it would show NOAUTH after. But we cannot see the response the client is seeing, since it's not logged by the StackExchange.Redis SDK 😢

dbarbosapn · 2024-09-03T21:06:06Z

Using separate socket manager instance didn't fix the issue.
Same exact symptoms (logs look the same as well).

mgravell · 2024-09-04T07:14:57Z

Ok, this is very weird! To confirm, there should be no ambient state in play here, so I'm very confused. I'll ping you on Teams (myself and Nick are also MSFT) - it may be easier to get to the bottom of this directly rather than on GitHub (and obviously any fixes/knowledge-gained can still work their way back here).

mgravell · 2024-09-04T12:46:41Z

Just updating the public logs; envoy proxy is in play here; by adding RESP logging, it looks like we're seeing something very odd;

outbound:

AUTH whatever
PING

(where the PING is a critical tracer to check that we think the server is behaving properly)

inbound:

+OK
-NOAUTH

But with the weird timing complication; the other servers in the timing matrix are not behind envoy. Investigations continue.

dbarbosapn · 2024-09-04T19:41:25Z

Updated the title and updating here with what we think is the root cause so far.
(Thanks a lot @mgravell and @NickCraver for the help building a repro!)

Even though Envoy connections are single-threaded by nature, when external authentication (new feature, more info here) is enabled, the authentication is done asynchronously.

This means that if the AUTH command is pipelined with the PING command, we will have exactly the repro we've seen.

mgravell · 2024-09-05T06:56:10Z

Emphasis: fundamentally this is an envoy bug. They must not process commands out of order (even if they buffer the response until appropriate) if the connection state is relevant, and here: the connection state is relevant. Is this logged with envoy? If not, I'm happy to log it.

dbarbosapn · 2024-09-05T07:36:08Z

FYI:
envoyproxy/envoy#35983

mgravell · 2024-09-12T14:21:39Z

Envoy fix merged; closing - thanks for the report; between us we got to the bottom of it!

dbarbosapn changed the title ~~Redis returns NOAUTH on SDK but works fine through redis-cli~~ Order of Connect() calls causes NOAUTH - Threading issue? Sep 3, 2024

dbarbosapn changed the title ~~Order of Connect() calls causes NOAUTH - Threading issue?~~ Pipelining causes an issue with External Authentication on Envoy Sep 4, 2024

dbarbosapn changed the title ~~Pipelining causes an issue with External Authentication on Envoy~~ Pipelining causes NOAUTH errors with External Authentication on Envoy Sep 4, 2024

dbarbosapn mentioned this issue Sep 4, 2024

Fix #2785: Support optionally disabling pipelining on AUTH flow #2787

Closed

mgravell closed this as completed Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipelining causes NOAUTH errors with External Authentication on Envoy #2785

Pipelining causes NOAUTH errors with External Authentication on Envoy #2785

dbarbosapn commented Sep 3, 2024 •

edited

Loading

slorello89 commented Sep 3, 2024

dbarbosapn commented Sep 3, 2024

slorello89 commented Sep 3, 2024

dbarbosapn commented Sep 3, 2024

dbarbosapn commented Sep 3, 2024

mgravell commented Sep 4, 2024

mgravell commented Sep 4, 2024

dbarbosapn commented Sep 4, 2024 •

edited

Loading

mgravell commented Sep 5, 2024

dbarbosapn commented Sep 5, 2024

mgravell commented Sep 12, 2024

Pipelining causes NOAUTH errors with External Authentication on Envoy #2785

Pipelining causes NOAUTH errors with External Authentication on Envoy #2785

Comments

dbarbosapn commented Sep 3, 2024 • edited Loading

slorello89 commented Sep 3, 2024

dbarbosapn commented Sep 3, 2024

slorello89 commented Sep 3, 2024

dbarbosapn commented Sep 3, 2024

dbarbosapn commented Sep 3, 2024

mgravell commented Sep 4, 2024

mgravell commented Sep 4, 2024

dbarbosapn commented Sep 4, 2024 • edited Loading

mgravell commented Sep 5, 2024

dbarbosapn commented Sep 5, 2024

mgravell commented Sep 12, 2024

dbarbosapn commented Sep 3, 2024 •

edited

Loading

dbarbosapn commented Sep 4, 2024 •

edited

Loading