Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel closure on genserver shutdown #186

Closed
nulian opened this issue Feb 19, 2021 · 8 comments
Closed

Channel closure on genserver shutdown #186

nulian opened this issue Feb 19, 2021 · 8 comments

Comments

@nulian
Copy link

nulian commented Feb 19, 2021

This issue is a bit of a continuation on issue #174 and MR #184.

When using DirectConsumer I would expect the channel to close when the GenServer stops.
But when DynamicSupervisor.terminate_child is used you get the issue at the bottom of this issue.
The MR I linked is a kind of fix on this that you no longer see the errors but the channel also won't close.
On this error below the channel also doesn't close but amqp also no longer tracks the channel causing the next error when you try to open a new channel on the same connection.
11:06:31.022 [warn] Connection (#PID<0.289.0>) closing: received hard error {:"connection.close", 504, "CHANNEL_ERROR - second 'channel.open' seen", 20, 10} from server

Currently still researching how I can force connection to correctly close without calling Channel.close in the genserver. This is mostly that when it crashes it should be closed correctly either way. Because else you get random channel leaks that won't truly resolve.

After doing more research and building a test case app you can reproduce it following the calls in the code block below.
https://github.com/nulian/rabbit_test

iex -S mix

iex(1)> RabbitSupervisor.start_child()
{:ok, #PID<0.284.0>}
iex(2)> RabbitSupervisor.start_child()
{:ok, #PID<0.290.0>}
iex(3)> RabbitSupervisor.start_child()
{:ok, #PID<0.296.0>}
iex(4)> {_, pid} = v()                                          
{:ok, #PID<0.296.0>}
iex(5)> DynamicSupervisor.terminate_child(RabbitSupervisor, pid)
:ok
iex(6)> 11:07:28.784 [error] gen_server <0.298.0> terminated with reason: {error,{consumer_died,shutdown}}

11:07:28.784 [error] GenServer #PID<0.298.0> terminating
** (stop) {:error, {:consumer_died, :shutdown}}
Last message: {:DOWN, #Reference<0.3377027747.3486515201.88223>, :process, #PID<0.296.0>, :shutdown}
State: {:state, AMQP.DirectConsumer, #PID<0.296.0>}
11:07:28.788 [error] CRASH REPORT Process <0.298.0> with 0 neighbours exited with reason: {error,{consumer_died,shutdown}} in gen_server2:terminate/3 line 1183
11:07:28.788 [error] Supervisor {<0.297.0>,amqp_channel_sup} had child gen_consumer started with amqp_gen_consumer:start_link('Elixir.AMQP.DirectConsumer', <0.296.0>, {<<"client 127.0.0.1:47123 -> 127.0.0.1:5672">>,3}) at <0.298.0> exit with reason {error,{consumer_died,shutdown}} in context child_terminated
11:07:28.788 [error] Supervisor {<0.297.0>,amqp_channel_sup} had child gen_consumer started with amqp_gen_consumer:start_link('Elixir.AMQP.DirectConsumer', <0.296.0>, {<<"client 127.0.0.1:47123 -> 127.0.0.1:5672">>,3}) at <0.298.0> exit with reason reached_max_restart_intensity in context shutdown
@nulian
Copy link
Author

nulian commented Feb 24, 2021

Don't see any way of closing a channel when it has errored like that.
The error causes the erlang genservers and supervisor to go down but doesn't take the channel with it so it stays open while the library forgets it's still open. Which in the end causes the library to try to open an already open channel.
Seems like a bug in the erlang amqp_client library that already doesn't work correctly with direct consumer because when I use the erlang direct consumer module it causes the same issue.
Though don't know enough erlang to find a good solution for that.

@ono
Copy link
Collaborator

ono commented Feb 24, 2021

Interesting! Confirmed that the channel process goes down without closing the channel properly. Since amqp_client suggests to close channel explicitly, I am not sure if the upstream library would be interested. it's worth trying though.

My suggestions:

  • Trap exit with Process.flag(:trap_exit, true) in your GenServer and make sure closing the channel explicitly there
  • Use a default consumer (SelectiveConsumer). It decouples nicely consumers from a channel. AMQP 2.0 has some improvement on its structure and there is not much difference from the performance point of view between Direct and Selective

@michaelklishin
Copy link

amqp_direct_consumer docs say

%% Warning! It is not recommended to rely on a consumer on killing off the
%% channel (through the exit signal). That may cause messages to get lost.
%% Always use amqp_channel:close/{1,3} for a clean shut down.

In general, consumers in RabbitMQ clients do not close channels when they run into an unhandled exception. Java client behaved this
way years ago and most users did not like it (it caught them by surprise), so now there are consumer implementations of both types,
"strict" and "forgiving", and I'm pretty sure the majority of the ecosystem these days uses the latter.

In addition it matters if the consumer was shut down cleanly or terminated abnormally. In case of a clean shutdown
I guess consumers could shut down their channels but I would argue that it's none of their business. The process that
starts both (opens the channel, spawn/starts a consumer) could decide what to do and why.

Erlang (and Elixir by proxy) allows you to link two processes so that they go down together. Libraries such as the RabbitMQ AMQP 0-9-1 client or this Elixir façade to it really should not be too opinionated because usually it is easy enough for your
to express how exactly you want process tree shutdown to happen.

@AndrewDryga
Copy link

AndrewDryga commented Feb 25, 2021

We suffer from a similar issue with "CHANNEL_ERROR - second 'channel.open' seen" error. This happens on CI when we start/close supervised consumers in setup/on_exit blocks, it looks like the channel is getting reused somewhere behind the scenes an error is returned because those channels are not in a state that can accept open command.

@nulian
Copy link
Author

nulian commented Feb 26, 2021

I continued it here findings are mostly is when you consume and not close the channel correctly but the process goes down it can crash the consumer which can crash the channel genservers without actually closing the connection.

rabbitmq/rabbitmq-server#2845

@ono
Copy link
Collaborator

ono commented Feb 27, 2021

hey @michaelklishin. thanks for jumping in the issue - great to get feedback from rabbitmq team.

I think this Github issue is discussing a few different things. I would like to clarify here.

When using DirectConsumer I would expect the channel to close when the GenServer stops.

Like the document of Erlang DirectConsumer states, you should close the channel explicitly.

11:06:31.022 [warn] Connection (#PID<0.289.0>) closing: received hard error {:"connection.close", 504, "CHANNEL_ERROR - second 'channel.open' seen", 20, 10} from server

This means - when the consumer is unexpectedly down it can cause the connection error. I am concerned with this issue. While I leave the discussion to the rabbitmq team Elixir library would provide two options.

@ono
Copy link
Collaborator

ono commented Mar 1, 2021

#192 will provide a workaround for the issue. I also added more information in the moduledoc. I will leave the connection error discussion in the upstream library. I am closing the issue but feel free to reopen if any actions needed from this library. thanks!

@lud-wj
Copy link

lud-wj commented Mar 28, 2023

Hello,

I am not sure if I understand correctly the different comments here.

With the current implementation, does the channel get automatically cleaned up if the process that called AMQP.Channel.open(conn) crashes ?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants