Using gRPC over named pipes causes issues when `grpc.client_idle_timeout_ms` elapses #2642

pelletier197 · 2024-01-16T19:49:19Z

Problem description

When using gRPC over Named pipes on Windows (IPC), while also firing requests continuously using multiple async requests for over 30 minutes, the connection eventually drops and never recovers. Another important detail is that I use only streaming requests with the server.

This is the actual logs that I get

[electron] D 2024-01-16T19:01:17.114Z | v1.9.13 25544 | connectivity_state | (6) unix:////./pipe/32ed9f96fe90c717 IDLE -> IDLE
[electron] D 2024-01-16T19:01:18.184Z | v1.9.13 25544 | channel | (6) unix:////./pipe/32ed9f96fe90c717 Idle timer triggered after 1800000ms of inactivity
[electron] D 2024-01-16T19:01:18.185Z | v1.9.13 25544 | connectivity_state | (6) unix:////./pipe/32ed9f96fe90c717 IDLE -> IDLE
[electron] [2024-01-16T14:01:20]     INFO Failed to scan file: Error: Failed to connect before the deadline. 4 attempts left.

I have traced the issue back to the property grpc.client_idle_timeout_ms, and I believe this issue might be linked to this function.

I have tried using tcp instead of named pipes and it works without any problem.

Reproduction steps

It is very difficult to give exact reproduction steps, as I have no idea what might be causing this. I have a huge amount of code around it, I think that someone with knowledge of the code might better understand why my connection is detected as idle even though I'm using it like crazy for streaming requests. Is it possible that streaming requests don't count as requests and that the connection timeout is not reset ?

Environment

Windows 11, AMD64
Node version 18.16
Node installation method = Choco
If applicable, compiler version [e.g. clang 3.8.0-2ubuntu4]
Package name and version = @grpc/[email protected]

The text was updated successfully, but these errors were encountered:

murgatroid99 · 2024-01-16T20:41:49Z

Those logs show a connection that is already IDLE, but it doesn't give any indication of why it became IDLE, or what else is going on that should affect the state. It would help to see the full log, from the start of the process to after you observe this problem.

pelletier197 · 2024-01-16T21:26:59Z

Here are the logs. I have not included every log since the start of the application, since it has to run for 30 minutes before I get the issue, but I did put a few seconds before the issue arises. There's nothing interesting before and after the portion I shared.

I annotated exactly where the requests start to fail. It seems clear that the connection goes Idle because something times out, even though we see that the connection receives requests a few seconds before that. We also see that the connection switches between idle and connecting continuously after the idle timeout is reached.

logs.log

There is a little bit of my application scrambled with it, but grpc logs are easy to differentiate.

murgatroid99 · 2024-01-16T21:38:29Z

That's definitely a bug. It looks like idle timer references are getting discarded but not cancelled, so they just fire later whether or not they should. I'm guessing you're calling waitForReady at the beginning of your application, possibly multiple times. I think that's what would trigger it. I think I know how to fix this, so I'll get that out soon.

pelletier197 · 2024-01-16T21:50:17Z

I do call waitForReady at the begin of every request actually. I'm not sure what's the best practice for this, I believe it's because it's possible that I do the first request before the other process is ready, and so doing the request immediately would fail every time for the first few requests and pollute my logs. It's also because my other process could be re-started occasionally, so that would help handle that gracefully.

I could have perfectly achieved that if there was a getter for the connection state instead, but that didn't exist so I used that function. Does that make sense ?

murgatroid99 · 2024-01-16T22:05:12Z

If I understand correctly, the behavior you want is if a connection drops, any calls made between then and when a connection is re-established will wait for the connection to be ready and then make the request, instead of failing immediately. You can accomplish that by providing metadata with the waitForReady option set: new Metadata({waitForReady: true}) (unfortunately similarly named for historical reasons).

In addition, there is actually a getter in the public API for the connection state, but it's a little obscure: client.getChannel().getConnectivityState(false). I want to note that I do not recommend using this: you are almost always better off allowing gRPC to manage connectivity state transparently. Please try out the waitForReady option before using this.

pelletier197 · 2024-01-16T22:09:41Z

I did use waitForReady before as well, and if I remember well, I think my issue was because waitForReady could actually hang forever (there seemed to be no timeout). Might be wrong, it's been a while since I did that.

murgatroid99 · 2024-01-16T22:17:10Z

In general, you want to set a deadline on the call itself, but that is especially important when using the waitForReady option. You are right that it can wait forever otherwise.

pelletier197 · 2024-01-16T22:25:32Z

Yeah that makes sense. My case is a bit complicated, because I send files to be scanned to an external process, and scan time can vary depending on the file. I didn't want to deadline to whole request, but I wanted to deadline the connection part, which wasn't possible. I don't expect this to be supported as it's fairly unique to my use-case, that's why I check connectivity manually. Thanks for the piece of code you shared above, that's really helpful!

murgatroid99 · 2024-01-16T22:35:01Z

I see. I understand why a request deadline wouldn't be good for your use case. However, if you are willing to adjust your protocol a little, I see another way to create an effective connection timeout without calling the waitForReady method. First, you would change your method handler on the server so that it always sends response metadata as soon as it receives a request. Then on your client, you would make requests with the waitForReady option set. The client would set a timer for the connection timeout and watch for the response metadata event. If the connection timeout timer fires before receiving the response metadata, you cancel the call and do whatever you want to do in the event of a connection timeout.

pelletier197 · 2024-01-16T23:01:59Z

I'll look into that. Thanks for the tip, and thanks for the quick reply and fix 😺

pelletier197 · 2024-01-17T13:37:42Z

I see the PR has been merged and released. Thanks a lot for this, I'm gonna close the issue 😃 Have a good one!

vamsideepak · 2024-01-26T14:51:33Z

#2649
@murgatroid99 @pelletier197 Can someone please help me in this ?

pelletier197 changed the title ~~Using gRPC over named pipes causes issues when~~ Using gRPC over named pipes causes issues when grpc.client_idle_timeout_ms elapses Jan 16, 2024

pelletier197 changed the title ~~Using gRPC over named pipes causes issues when grpc.client_idle_timeout_ms elapses~~ Using gRPC over named pipes causes issues when grpc.client_idle_timeout_ms elapses Jan 16, 2024

murgatroid99 mentioned this issue Jan 16, 2024

grpc-js: Fix and optimize IDLE timeouts #2643

Merged

pelletier197 closed this as completed Jan 17, 2024

murgatroid99 mentioned this issue Jan 26, 2024

grpc-js not supporting the namedpipe communication #2649

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using gRPC over named pipes causes issues when `grpc.client_idle_timeout_ms` elapses #2642

Using gRPC over named pipes causes issues when `grpc.client_idle_timeout_ms` elapses #2642

pelletier197 commented Jan 16, 2024

murgatroid99 commented Jan 16, 2024

pelletier197 commented Jan 16, 2024 •

edited

Loading

murgatroid99 commented Jan 16, 2024

pelletier197 commented Jan 16, 2024 •

edited

Loading

murgatroid99 commented Jan 16, 2024

pelletier197 commented Jan 16, 2024

murgatroid99 commented Jan 16, 2024

pelletier197 commented Jan 16, 2024 •

edited

Loading

murgatroid99 commented Jan 16, 2024

pelletier197 commented Jan 16, 2024 •

edited

Loading

pelletier197 commented Jan 17, 2024

vamsideepak commented Jan 26, 2024 •

edited

Loading

Using gRPC over named pipes causes issues when grpc.client_idle_timeout_ms elapses #2642

Using gRPC over named pipes causes issues when grpc.client_idle_timeout_ms elapses #2642

Comments

pelletier197 commented Jan 16, 2024

Problem description

Reproduction steps

Environment

murgatroid99 commented Jan 16, 2024

pelletier197 commented Jan 16, 2024 • edited Loading

murgatroid99 commented Jan 16, 2024

pelletier197 commented Jan 16, 2024 • edited Loading

murgatroid99 commented Jan 16, 2024

pelletier197 commented Jan 16, 2024

murgatroid99 commented Jan 16, 2024

pelletier197 commented Jan 16, 2024 • edited Loading

murgatroid99 commented Jan 16, 2024

pelletier197 commented Jan 16, 2024 • edited Loading

pelletier197 commented Jan 17, 2024

vamsideepak commented Jan 26, 2024 • edited Loading

Using gRPC over named pipes causes issues when `grpc.client_idle_timeout_ms` elapses #2642

Using gRPC over named pipes causes issues when `grpc.client_idle_timeout_ms` elapses #2642

pelletier197 commented Jan 16, 2024 •

edited

Loading

pelletier197 commented Jan 16, 2024 •

edited

Loading

pelletier197 commented Jan 16, 2024 •

edited

Loading

pelletier197 commented Jan 16, 2024 •

edited

Loading

vamsideepak commented Jan 26, 2024 •

edited

Loading