fix: guard concurrent accesses to node api #412
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Following investigation by @placer14 we suspect the panic in #411 is due to concurrent modifications of the node field in the actor state task. When originally written this code was sequential but recent changes have introduced concurrent accesses to the field, namely when the Close method is called. I speculate this can happen when another task fails while an actor state task is still spawning goroutines. The task failure is detected by the indexer which closes all lenses used by the other running tasks, setting the node field to nil which is then passed to a new goroutine by the still-running task.
This change guards accesses to the node field with mutexes in each of the tasks. The actor state task checks whether the api lens is available before processing and if it isn't returns a non-fatal error for each actor which will be logged in visor_processing_reports for that tipset+task+actor combination.
Also removed the node field from the messages task since it is not used.
Fixes #411