Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using gRPC over named pipes causes issues when grpc.client_idle_timeout_ms elapses #2642

Closed
pelletier197 opened this issue Jan 16, 2024 · 12 comments

Comments

@pelletier197
Copy link

Problem description

When using gRPC over Named pipes on Windows (IPC), while also firing requests continuously using multiple async requests for over 30 minutes, the connection eventually drops and never recovers. Another important detail is that I use only streaming requests with the server.

This is the actual logs that I get

[electron] D 2024-01-16T19:01:17.114Z | v1.9.13 25544 | connectivity_state | (6) unix:////./pipe/32ed9f96fe90c717 IDLE -> IDLE
[electron] D 2024-01-16T19:01:18.184Z | v1.9.13 25544 | channel | (6) unix:////./pipe/32ed9f96fe90c717 Idle timer triggered after 1800000ms of inactivity
[electron] D 2024-01-16T19:01:18.185Z | v1.9.13 25544 | connectivity_state | (6) unix:////./pipe/32ed9f96fe90c717 IDLE -> IDLE
[electron] [2024-01-16T14:01:20]     INFO Failed to scan file: Error: Failed to connect before the deadline. 4 attempts left.

I have traced the issue back to the property grpc.client_idle_timeout_ms, and I believe this issue might be linked to this function.

I have tried using tcp instead of named pipes and it works without any problem.

Reproduction steps

It is very difficult to give exact reproduction steps, as I have no idea what might be causing this. I have a huge amount of code around it, I think that someone with knowledge of the code might better understand why my connection is detected as idle even though I'm using it like crazy for streaming requests. Is it possible that streaming requests don't count as requests and that the connection timeout is not reset ?

Environment

  • Windows 11, AMD64
  • Node version 18.16
  • Node installation method = Choco
  • If applicable, compiler version [e.g. clang 3.8.0-2ubuntu4]
  • Package name and version = @grpc/[email protected]
@pelletier197 pelletier197 changed the title Using gRPC over named pipes causes issues when Using gRPC over named pipes causes issues when grpc.client_idle_timeout_ms elapses Jan 16, 2024
@pelletier197 pelletier197 changed the title Using gRPC over named pipes causes issues when grpc.client_idle_timeout_ms elapses Using gRPC over named pipes causes issues when grpc.client_idle_timeout_ms elapses Jan 16, 2024
@murgatroid99
Copy link
Member

Those logs show a connection that is already IDLE, but it doesn't give any indication of why it became IDLE, or what else is going on that should affect the state. It would help to see the full log, from the start of the process to after you observe this problem.

@pelletier197
Copy link
Author

pelletier197 commented Jan 16, 2024

Here are the logs. I have not included every log since the start of the application, since it has to run for 30 minutes before I get the issue, but I did put a few seconds before the issue arises. There's nothing interesting before and after the portion I shared.

I annotated exactly where the requests start to fail. It seems clear that the connection goes Idle because something times out, even though we see that the connection receives requests a few seconds before that. We also see that the connection switches between idle and connecting continuously after the idle timeout is reached.

logs.log

There is a little bit of my application scrambled with it, but grpc logs are easy to differentiate.

@murgatroid99
Copy link
Member

That's definitely a bug. It looks like idle timer references are getting discarded but not cancelled, so they just fire later whether or not they should. I'm guessing you're calling waitForReady at the beginning of your application, possibly multiple times. I think that's what would trigger it. I think I know how to fix this, so I'll get that out soon.

@pelletier197
Copy link
Author

pelletier197 commented Jan 16, 2024

I do call waitForReady at the begin of every request actually. I'm not sure what's the best practice for this, I believe it's because it's possible that I do the first request before the other process is ready, and so doing the request immediately would fail every time for the first few requests and pollute my logs. It's also because my other process could be re-started occasionally, so that would help handle that gracefully.

I could have perfectly achieved that if there was a getter for the connection state instead, but that didn't exist so I used that function. Does that make sense ?

@murgatroid99
Copy link
Member

If I understand correctly, the behavior you want is if a connection drops, any calls made between then and when a connection is re-established will wait for the connection to be ready and then make the request, instead of failing immediately. You can accomplish that by providing metadata with the waitForReady option set: new Metadata({waitForReady: true}) (unfortunately similarly named for historical reasons).

In addition, there is actually a getter in the public API for the connection state, but it's a little obscure: client.getChannel().getConnectivityState(false). I want to note that I do not recommend using this: you are almost always better off allowing gRPC to manage connectivity state transparently. Please try out the waitForReady option before using this.

@pelletier197
Copy link
Author

I did use waitForReady before as well, and if I remember well, I think my issue was because waitForReady could actually hang forever (there seemed to be no timeout). Might be wrong, it's been a while since I did that.

@murgatroid99
Copy link
Member

In general, you want to set a deadline on the call itself, but that is especially important when using the waitForReady option. You are right that it can wait forever otherwise.

@pelletier197
Copy link
Author

pelletier197 commented Jan 16, 2024

Yeah that makes sense. My case is a bit complicated, because I send files to be scanned to an external process, and scan time can vary depending on the file. I didn't want to deadline to whole request, but I wanted to deadline the connection part, which wasn't possible. I don't expect this to be supported as it's fairly unique to my use-case, that's why I check connectivity manually. Thanks for the piece of code you shared above, that's really helpful!

@murgatroid99
Copy link
Member

I see. I understand why a request deadline wouldn't be good for your use case. However, if you are willing to adjust your protocol a little, I see another way to create an effective connection timeout without calling the waitForReady method. First, you would change your method handler on the server so that it always sends response metadata as soon as it receives a request. Then on your client, you would make requests with the waitForReady option set. The client would set a timer for the connection timeout and watch for the response metadata event. If the connection timeout timer fires before receiving the response metadata, you cancel the call and do whatever you want to do in the event of a connection timeout.

@pelletier197
Copy link
Author

pelletier197 commented Jan 16, 2024

I'll look into that. Thanks for the tip, and thanks for the quick reply and fix 😺

@pelletier197
Copy link
Author

I see the PR has been merged and released. Thanks a lot for this, I'm gonna close the issue 😃 Have a good one!

@vamsideepak
Copy link

vamsideepak commented Jan 26, 2024

#2649
@murgatroid99 @pelletier197 Can someone please help me in this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants