Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutting down Cassandra node causes process exit #359

Closed
harunzengin opened this issue Mar 5, 2024 · 2 comments · Fixed by #365
Closed

Shutting down Cassandra node causes process exit #359

harunzengin opened this issue Mar 5, 2024 · 2 comments · Fixed by #365
Labels

Comments

@harunzengin
Copy link
Contributor

harunzengin commented Mar 5, 2024

While testing #358 and shutting nodes down, I realized that we get exits like following when a node is shut down:

 ** (stop) exited in: :gen_statem.call(#PID<0.5947.0>, {:checkout_state_for_next_request, #Reference<0.0.460547.2884031648.17891330.200184>}, :infinity)
     ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
     (stdlib 5.2) gen.erl:246: :gen.do_call/4
     (stdlib 5.2) gen_statem.erl:923: :gen_statem.call/3
     (xandra 0.18.1) lib/xandra/connection.ex:158: Xandra.Connection.execute/4
     (xandra 0.18.1) lib/xandra.ex:1272: Xandra.execute_without_retrying/4
     (xandra 0.18.1) lib/xandra/retry_strategy.ex:309: Xandra.RetryStrategy.run_on_cluster/5

I guess the connection processes get terminated right after Xandra.Cluster.Pool.checkout returns the connection pids. This causes the client processes to terminate as well. The RetryStrategy cannot try the query on another node in this case I think.

@whatyouhide
Copy link
Owner

Ah, gotcha, yes this makes sense. @harunzengin I think the solution here is to guard against exits when calling Xandra.Connection.execute/4. The thing I’m trying to figure out is where to guard against this. We could do it in Xandra.Connection.execute/4 itself, but that worries me because it applies to non-cluster connections too (which should not go down).

An alternative is to do it in in places like this, where instead of calling Xandra.Connection.execute/4 we wrap it up. Something like:

    with_conn_and_retrying(cluster, options, fn conn ->
      try do
        Xandra.execute(conn, query, params, options_without_retry_strategy)
      catch
        # IIRC this is what it looks like but this needs to be tested.
        :exit, {:noproc, _} ->
          {:error, ...}
    end)

Thoughts? Can you work on a PR? I won't have time this week.

@whatyouhide whatyouhide changed the title Shutting down Cassandra Node causes exits Shutting down Cassandra node causes process exit Mar 13, 2024
@whatyouhide
Copy link
Owner

@harunzengin ping 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants