fix: guard concurrent accesses to node api #412

iand · 2021-03-09T12:21:41Z

Following investigation by @placer14 we suspect the panic in #411 is due to concurrent modifications of the node field in the actor state task. When originally written this code was sequential but recent changes have introduced concurrent accesses to the field, namely when the Close method is called. I speculate this can happen when another task fails while an actor state task is still spawning goroutines. The task failure is detected by the indexer which closes all lenses used by the other running tasks, setting the node field to nil which is then passed to a new goroutine by the still-running task.

This change guards accesses to the node field with mutexes in each of the tasks. The actor state task checks whether the api lens is available before processing and if it isn't returns a non-fatal error for each actor which will be logged in visor_processing_reports for that tipset+task+actor combination.

Also removed the node field from the messages task since it is not used.

Fixes #411

placer14 · 2021-03-09T17:54:16Z

Thanks for putting this together, Ian.

fix: guard concurrent accesses to node api

86ad716

iand requested a review from placer14 March 9, 2021 12:21

placer14 approved these changes Mar 9, 2021

View reviewed changes

frrist approved these changes Mar 9, 2021

View reviewed changes

placer14 merged commit de2eda0 into master Mar 9, 2021

placer14 deleted the fix/avoid-panic-closed-lens branch March 9, 2021 23:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: guard concurrent accesses to node api #412

fix: guard concurrent accesses to node api #412

iand commented Mar 9, 2021

placer14 commented Mar 9, 2021

fix: guard concurrent accesses to node api #412

fix: guard concurrent accesses to node api #412

Conversation

iand commented Mar 9, 2021

placer14 commented Mar 9, 2021