Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: message cache unordered removal #2682

Merged
merged 1 commit into from
May 9, 2024

Conversation

SionoiS
Copy link
Contributor

@SionoiS SionoiS commented May 8, 2024

Fixes a crash in message cache because the code assumed messages to be removed were ordered.

I also added a basic fuzzing test.

Closes #2680

@SionoiS SionoiS self-assigned this May 8, 2024
Copy link

github-actions bot commented May 8, 2024

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2682-rln-v1

Built from 18916e6ea0c7283b0e9ec7dbc5d7353c72af3523

Copy link

github-actions bot commented May 8, 2024

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2682-rln-v2

Built from 18916e6ea0c7283b0e9ec7dbc5d7353c72af3523

Copy link
Contributor

@gabrielmer gabrielmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@vpavlin
Copy link
Member

vpavlin commented May 8, 2024

quay.io/wakuorg/nwaku-pr:2682-rln-v1

@erhant If you feel like it, you could use this image to check if it actually fixes your issues

@erhant
Copy link

erhant commented May 8, 2024

quay.io/wakuorg/nwaku-pr:2682-rln-v1

@erhant If you feel like it, you could use this image to check if it actually fixes your issues

awesome, thanks!

@erhant
Copy link

erhant commented May 8, 2024

update: the new image above worked like charm, great work!

For the record, we were able to reproduce the bug with @anilaltuner as follows:

  1. he and I run a node at the same time (using the same wallet if that makes a difference). we have to content topics heartbeat and synthesis, both topics are published by him and subscribed by my node.
  2. his node keeps sending heartbeat messages, to which my node responds with a unique content topic w.r.t the payload; think of this like simple ping-pong for health check
  3. at some point, he sends my node a synthesis message which includes a prompt for LLM, I obtain the result from LLM and send my payload back; note that these are two threads, so heartbeat checks are going on in the background while I do my synthesis work
  4. around 2-3 seconds after I see my log of "sending message" for synthesis, I see that heartbeat checks are failing because the server is down, and thats how I see nwaku is closed due to error
  5. looking at the logs, its the same error (e.g. 2024-05-08 18:46:59 Error: unhandled exception: index 28 not in 0 .. 24 [IndexDefect])

we are working with https://github.com/firstbatchxyz/dkn-compute-node for the context*

anyways, im confident that this is fixed now

Copy link
Contributor

@NagyZoltanPeter NagyZoltanPeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I say LGTM? Yes it is, but this code definetely lacks some explanatory comments, like why the descendant sorting needed.
Sometime we might think of implementing a multi index container, that can simplify such code a lot.

@SionoiS
Copy link
Contributor Author

SionoiS commented May 9, 2024

Shall I say LGTM? Yes it is, but this code definetely lacks some explanatory comments, like why the descendant sorting needed. Sometime we might think of implementing a multi index container, that can simplify such code a lot.

I went with the easy fix and I agree that the message cache impl. should be rewritten.

@SionoiS SionoiS merged commit fa26d05 into master May 9, 2024
14 of 16 checks passed
@SionoiS SionoiS deleted the fix--message-cache-unordered-remove branch May 9, 2024 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: message_cache misses index while delete from cache
5 participants