Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't crash stream_status cmd if member node is down #8270

Merged
merged 4 commits into from
May 23, 2023

Conversation

gomoripeti
Copy link
Contributor

@gomoripeti gomoripeti commented May 23, 2023

Proposed Changes

This PR is based on #8269, but the last two commits apply cleanly on v3.11.x as well.

  • commit "Don't crash stream_status cmd if member node is down" fixes a crash when one of the member nodes of a stream is down
    (started 3 node cluster locally with make start-cluster, created a stream queue sq1 with 3 members, one on each node, then stoppped rabbit-3)
sbin/rabbitmq-queues --node rabbit-1@Peters-MacBook-Pro stream_status sq1
Status of stream sq1 on node rabbit-1@Peters-MacBook-Pro ...
Error:
{:function_clause, [{:rabbit_stream_queue, :"-get_counters/1-fun-4-", [{:replica, %{node: :"rabbit-2@Peters-MacBook-Pro", offset: 1, chunks: 2, epoch: 1, first_offset: 0, readers: 0, segments: 1, committed_offset: 1, first_timestamp: 0, forced_gcs: 2800, packets: 2}}, %{node: :"rabbit-3@Peters-MacBook-Pro"}], [file: 'rabbit_stream_queue.erl', line: 686]}, {:lists, :sort, 2, [file: 'lists.erl', line: 1210]}, {:rabbit_stream_queue, :status, 2, [file: 'rabbit_stream_queue.erl', line: 661]}]}

The crash was revealed by PR #6442, before that the map returned by the badrpc case was implicitly filtered out by the list comprehension in status/2.

  • commit "Extend stream_status to list down members as well" makes stream_status command also print a line for each member which is down, with undefined counter values
% sbin/rabbitmq-queues --node rabbit-2@`hostname -s` stream_status sq1
Status of stream sq1 on node rabbit-2@Peters-MacBook-Pro ...
┌─────────┬─────────────────────────────┬───────────┬───────────┬──────────────────┬──────────────┬───────────┬───────────┐
│ role    │ node                        │ epoch     │ offset    │ committed_offset │ first_offset │ readers   │ segments  │
├─────────┼─────────────────────────────┼───────────┼───────────┼──────────────────┼──────────────┼───────────┼───────────┤
│ replica │ rabbit-1@Peters-MacBook-Pro │ 5         │ 2         │ 2                │ 0            │ 0         │ 1         │
├─────────┼─────────────────────────────┼───────────┼───────────┼──────────────────┼──────────────┼───────────┼───────────┤
│ writer  │ rabbit-2@Peters-MacBook-Pro │ 5         │ 2         │ 2                │ 0            │ 1         │ 1         │
├─────────┼─────────────────────────────┼───────────┼───────────┼──────────────────┼──────────────┼───────────┼───────────┤
│ replica │ rabbit-3@Peters-MacBook-Pro │ undefined │ undefined │ undefined        │ undefined    │ undefined │ undefined │
└─────────┴─────────────────────────────┴───────────┴───────────┴──────────────────┴──────────────┴───────────┴───────────┘

Let me know if this second change is not desirable, and stream_status should only print reachable members.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)
  • Build system and/or CI

Checklist

Put an x in the boxes that apply.
You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.

Further Comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc.

`rabbit_steam_coordinator:stream_overview/1` returns the member in
different format than `members/1`, resulting in `is_writer/1` never
matching. This was changed in PR rabbitmq#6440. This prevents getting counters
from stream writer last as intended by PR rabbitmq#6442.
@michaelklishin michaelklishin added this to the 3.12.0 milestone May 23, 2023
@michaelklishin michaelklishin merged commit a5776a0 into rabbitmq:main May 23, 2023
@michaelklishin
Copy link
Member

@Mergifyio backport v3.12.x

@mergify
Copy link

mergify bot commented May 23, 2023

backport v3.12.x

✅ Backports have been created

michaelklishin added a commit that referenced this pull request May 23, 2023
Don't crash stream_status cmd if member node is down (backport #8270)
@michaelklishin
Copy link
Member

Backported manually to v3.11.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants