Shutting down Cassandra node causes process exit #359

harunzengin · 2024-03-05T15:30:39Z

While testing #358 and shutting nodes down, I realized that we get exits like following when a node is shut down:

 ** (stop) exited in: :gen_statem.call(#PID<0.5947.0>, {:checkout_state_for_next_request, #Reference<0.0.460547.2884031648.17891330.200184>}, :infinity)
     ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
     (stdlib 5.2) gen.erl:246: :gen.do_call/4
     (stdlib 5.2) gen_statem.erl:923: :gen_statem.call/3
     (xandra 0.18.1) lib/xandra/connection.ex:158: Xandra.Connection.execute/4
     (xandra 0.18.1) lib/xandra.ex:1272: Xandra.execute_without_retrying/4
     (xandra 0.18.1) lib/xandra/retry_strategy.ex:309: Xandra.RetryStrategy.run_on_cluster/5

I guess the connection processes get terminated right after Xandra.Cluster.Pool.checkout returns the connection pids. This causes the client processes to terminate as well. The RetryStrategy cannot try the query on another node in this case I think.

The text was updated successfully, but these errors were encountered:

whatyouhide · 2024-03-13T13:10:54Z

Ah, gotcha, yes this makes sense. @harunzengin I think the solution here is to guard against exits when calling Xandra.Connection.execute/4. The thing I’m trying to figure out is where to guard against this. We could do it in Xandra.Connection.execute/4 itself, but that worries me because it applies to non-cluster connections too (which should not go down).

An alternative is to do it in in places like this, where instead of calling Xandra.Connection.execute/4 we wrap it up. Something like:

    with_conn_and_retrying(cluster, options, fn conn ->
      try do
        Xandra.execute(conn, query, params, options_without_retry_strategy)
      catch
        # IIRC this is what it looks like but this needs to be tested.
        :exit, {:noproc, _} ->
          {:error, ...}
    end)

Thoughts? Can you work on a PR? I won't have time this week.

whatyouhide · 2024-03-28T08:29:13Z

@harunzengin ping 🙃

whatyouhide mentioned this issue Mar 6, 2024

Clean up timeouts in the conn and add :max_concurrent_requests_per_connection #358

Merged

whatyouhide changed the title ~~Shutting down Cassandra Node causes exits~~ Shutting down Cassandra node causes process exit Mar 13, 2024

whatyouhide added the Kind:Bug label Mar 13, 2024

harunzengin mentioned this issue Apr 19, 2024

Catch noproc errors #365

Merged

whatyouhide closed this as completed in #365 Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shutting down Cassandra node causes process exit #359

Shutting down Cassandra node causes process exit #359

harunzengin commented Mar 5, 2024 •

edited

Loading

whatyouhide commented Mar 13, 2024

whatyouhide commented Mar 28, 2024

Shutting down Cassandra node causes process exit #359

Shutting down Cassandra node causes process exit #359

Comments

harunzengin commented Mar 5, 2024 • edited Loading

whatyouhide commented Mar 13, 2024

whatyouhide commented Mar 28, 2024

harunzengin commented Mar 5, 2024 •

edited

Loading