Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipelining causes NOAUTH errors with External Authentication on Envoy #2785

Closed
dbarbosapn opened this issue Sep 3, 2024 · 11 comments
Closed

Comments

@dbarbosapn
Copy link
Contributor

dbarbosapn commented Sep 3, 2024

Hello,

We have a situation where our Redis client is returning NOAUTH during the connection flow, but if we manually connect to it through redis-cli and send an AUTH <password> command, from that exact same machine, we get OK and can send other commands after.

There is no place where we can see the actual result of the AUTH command sent by the SDK since, from what we've seen in code, it's a fire and forget message.
Any ideas how to debug this?

Important Note
We have multiple redis clusters that we're connecting to. If we put this cluster first, it works. If we put it last, it doesn't!

For example, this doesn't work:

ConnectionMultiplexer.Connect(options1);
...
ConnectionMultiplexer.Connect(options2);
...
ConnectionMultiplexer.Connect(options3);
...
ConnectionMultiplexer.Connect(options4);

But this does:

ConnectionMultiplexer.Connect(options4);
...
ConnectionMultiplexer.Connect(options2);
...
ConnectionMultiplexer.Connect(options3);
...
ConnectionMultiplexer.Connect(options1);

We're wondering if somehow the multiplexer might be using a wrong connection from a completely different instance. If there is some kind of static or thread-local state.

Some logs:

Connecting (sync) on .NET 8.0.8 (StackExchange.Redis: v2.8.0.27420)
<redacted>.internal.office.com:6380,syncTimeout=5000,connectTimeout=17000,abortConnect=False
<redacted>.internal.office.com:6380/Interactive: Connecting...
<redacted>.internal.office.com:6380: BeginConnectAsync
1 unique nodes specified (with tiebreaker)
<redacted>.internal.office.com:6380: OnConnectedAsync init (State=Connecting)
Allowing 1 endpoint(s) 00:00:17 to respond...
Awaiting 1 available task completion(s) for 17000ms, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=0,Free=32767,Min=4,Max=32767), POOL: (Threads=6,QueuedItems=0,CompletedItems=404,Timers=28)
Configuring TLS
TLS connection established successfully using protocol: Tls13
<redacted>.internal.office.com:6380/Interactive: Connected
<redacted>.internal.office.com:6380: Server handshake
<redacted>.internal.office.com:6380: Authenticating (password)
<redacted>.internal.office.com:6380: Sending critical tracer (handshake): PING
<redacted>.internal.office.com:6380: Flushing outbound buffer
<redacted>.internal.office.com:6380: OnEstablishingAsync complete
<redacted>.internal.office.com:6380: Starting read
Response from <redacted>.internal.office.com:6380/Interactive / PING: Error: NOAUTH Authentication required.
<redacted>.internal.office.com:6380: OnConnectedAsync completed (Disconnected)
All 1 available tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=2,Free=32765,Min=4,Max=32767), POOL: (Threads=6,QueuedItems=0,CompletedItems=422,Timers=27)
Endpoint summary:
<redacted>.internal.office.com:6380: Endpoint is (Interactive: Connecting, Subscription: Disconnected)
Task summary:
<redacted>.internal.office.com:6380: Returned, but incorrectly
Endpoint Summary:
<redacted>.internal.office.com:6380: Envoyproxy v6.0.0, primary; 1 databases; keep-alive: 00:01:00; int: Connecting; sub: n/a; not in use: DidNotRespond
<redacted>.internal.office.com:6380: int ops=2, qu=0, qs=0, qc=0, wr=0, socks=2; sub ops=0, qu=0, qs=0, qc=0, wr=0
<redacted>.internal.office.com:6380: Circular op-count snapshot; int: 0+2=2 (0.20 ops/s; spans 10s); sub:
Sync timeouts: 0; async timeouts: 0; fire and forget: 0; last heartbeat: -1s ago
Resetting failing connections to retry...
AuthenticationFailure on <redacted>.internal.office.com:6380/Interactive, Flushed/ComputeResult, last: PING, origin: SetResult, outstanding: 0, last-read: 0s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablishing, mgr: 9 of 10 available, last-heartbeat: never, last-mbeat: 0s ago, global: 0s ago, v: 2.8.0.27420
@dbarbosapn dbarbosapn changed the title Redis returns NOAUTH on SDK but works fine through redis-cli Order of Connect() calls causes NOAUTH - Threading issue? Sep 3, 2024
@slorello89
Copy link
Collaborator

Are you putting your password in the ConfigurationOptions? because it should call the AUTH automatically.

@dbarbosapn
Copy link
Contributor Author

Yes, we are.
You can even see on the logs it shows "Authenticating (password)".

This is really weird

@slorello89
Copy link
Collaborator

Maybe try giving each of your ConfigurationOptions a unique SocketManager? That is a shared resource unless you overwrite it (though I wouldn't think it would matter because presumably all of your clusters are at different endpoints).

@dbarbosapn
Copy link
Contributor Author

Yes all clusters are different endpoints...
Also maybe important information is that we see the AUTH call incoming on server-side, and we know it is authenticated successfully. Makes no sense how it would show NOAUTH after. But we cannot see the response the client is seeing, since it's not logged by the StackExchange.Redis SDK 😢

@dbarbosapn
Copy link
Contributor Author

Using separate socket manager instance didn't fix the issue.
Same exact symptoms (logs look the same as well).

@mgravell
Copy link
Collaborator

mgravell commented Sep 4, 2024

Ok, this is very weird! To confirm, there should be no ambient state in play here, so I'm very confused. I'll ping you on Teams (myself and Nick are also MSFT) - it may be easier to get to the bottom of this directly rather than on GitHub (and obviously any fixes/knowledge-gained can still work their way back here).

@mgravell
Copy link
Collaborator

mgravell commented Sep 4, 2024

Just updating the public logs; envoy proxy is in play here; by adding RESP logging, it looks like we're seeing something very odd;

outbound:

AUTH whatever
PING

(where the PING is a critical tracer to check that we think the server is behaving properly)

inbound:

+OK
-NOAUTH

But with the weird timing complication; the other servers in the timing matrix are not behind envoy. Investigations continue.

@dbarbosapn dbarbosapn changed the title Order of Connect() calls causes NOAUTH - Threading issue? Pipelining causes an issue with External Authentication on Envoy Sep 4, 2024
@dbarbosapn dbarbosapn changed the title Pipelining causes an issue with External Authentication on Envoy Pipelining causes NOAUTH errors with External Authentication on Envoy Sep 4, 2024
@dbarbosapn
Copy link
Contributor Author

dbarbosapn commented Sep 4, 2024

Updated the title and updating here with what we think is the root cause so far.
(Thanks a lot @mgravell and @NickCraver for the help building a repro!)

Even though Envoy connections are single-threaded by nature, when external authentication (new feature, more info here) is enabled, the authentication is done asynchronously.

This means that if the AUTH command is pipelined with the PING command, we will have exactly the repro we've seen.

@mgravell
Copy link
Collaborator

mgravell commented Sep 5, 2024

Emphasis: fundamentally this is an envoy bug. They must not process commands out of order (even if they buffer the response until appropriate) if the connection state is relevant, and here: the connection state is relevant. Is this logged with envoy? If not, I'm happy to log it.

@dbarbosapn
Copy link
Contributor Author

FYI:
envoyproxy/envoy#35983

@mgravell
Copy link
Collaborator

Envoy fix merged; closing - thanks for the report; between us we got to the bottom of it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants